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SECTION 1: INTRODUCTION 



1.1. Introduction 

This manual describes version 7 of the SPARC architecture, Sun Microsystems' 32-bit RISC 
architecture. This architecture makes possible implementations that can execute instmctions for 
high-level language programs at rates approaching 1 instruction per processor clock. It supports 
a floating-point coprocessor with multiple arithmetic units and a second, implementation- 
definable coprocessor. 

1.2. Archttecture and Implementation 

This document provides a specification for the SPARC architecture; it describes the major 
aspects of that architecture. Any design which conforms to this specification is an implementa- 
tion; aspects of the design that are not specified in this document are implementation-dependent. 
For example, the SPARC architecture defines a set of instnjctions, a set of registers, how the 
registers work, and how traps and internjpts work. It does not define details such as the size 
and timing of data and address busses, caches, or memory management units. 

Specific information about Sun Microsystems' implementations of the SPARC architecture 
appear in companion manuals. 



1.3. Features 

The SPARC architecture provides the following features: 

Simple instructions — Most instmctions require only a single arithmetic operation. 

Few and simple instruction formats — All instructions are 32 bits wide, and are aligned on 
32-bit boundaries in memory. There are only three basic instruction formats, and they 
feature uniform placement of opcode and register address fields. 

Register-intensive architecture — Most instructions operate on either two registers or one 
register and a constant, and place the result in a third register. Only load and store instruc- 
tions access storage. 

A large '^A^indowed" register file — The processor has access to a large number of registers 
configured into several overlapping sets. This scheme allows compilers to cache local 
values across subroutine calls, and provides a register-based parameter passing mechan- 
ism. 

Delayed control transfer — The processor always fetches the next instruction after a control 
transfer, and either executes it or annuls it, depending on the transfer's "annul" bit. Com- 
pilers can rearrange code to place a useful instoiction after a delayed control transfer and 
thereby take better advantage of the processor's pipeline. 

• One-cycle execution — To take maximum advantage of the SPARC architecture, the 
memory system should be able to fetch instnjctions at an average rate of one per processor 
cycle. This allows most instructions to execute in one cycle. 

Concurrent floating point — Floating-point operate instructions can execute concurrently with 
each other and with other non-floating-point instructions. 
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Coprocessor interface — The architecture supports a simple coprocessor interface. The 
coprocessor instruction set is analogous to the floating-point instruction set. 



1.4. Using This Manual 

This section provides information to help you use this manual. It includes an overview of the 
manual, a definition of the intended audience, a description of the fonts used and what they 
mean, a glossary, and a list of references. 



1.4.1. Contents 

The section after this contains an overview of the SPARC architecture. This is followed by sec- 
tions that describe the registers, then the instructions, and finally, trapping and exceptions. 

A series of appendices follow the sections. The most important is Appendix B, Instruction 
Descriptions. This contains a complete description of every instruction that the architecture sup- 
ports, and includes tables showing the recommended assembly language syntax for each 
instnjction. Another appendix contains tables detailing all the opcodes and condition codes, and 
another contains ISP description language for all the instnjctions plus other architecture func- 
tions. 

1.4.2. Fonts In Text 

In this manual, we use the following fonts to make things clearer: 

Roman font is the normal font used for text. 

Italic font represents either a register class or a field name. For example: 

"The rs1 field contains the address of the r register:' 

It is also used for regular notes, and for references to sections, sections or appendices in 
this manual, or to other documents. 

Typewriter font is used for the names of certain signals that are defined in the section 
SPARC Architecture Overview, and for literals in the appendix Suggested Assembly 
Language Syntax, These signal names appear in typewriter font, and contain undert)ar 
characters in the spaces between the words in the name. For example: 

The signal dp_reseL/^ indicates that the system is requesting a reset. 

Bold font indicates that a word or phrase requires emphasis. For example: 

"The delay instruction occurs immediately after a control transfer". 

UPPER CASE items may be either acronyms or instruction names. The most common acro- 
nyms appear in the glossary in this section, and the instructions are all listed by name in 
Appendix B. Note that names of some instructions contain both upper case and lower case 
letters. 

UndertDar characters between two or more words mean that the words represent an 
identifier, which may be a trap, or some other condition. These appear in ordinary text as 
well as in the pseudocode examples in the appendices. For example: 

"The lU acknowledges the exception by taking an fp_exception trap." 
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1.4.3. Notes 



This manual provides three types of notes: ordinary notes, programming notes, and implementa- 
tion notes. 

Ordinary notes contain incidental information alx)ut the current subject; they appear in Italic 
font. 

Programming notes contain incidental information about programming using the SPARC 
architecture; they appear in reduced pitch Roman font. 

Implementation notes contain information which may be specific to an implementation or 
which may differ in different implementations. They also appear in reduced pitch Roman font. 



1.4.4. Glossary 

The following paragraphs list and describe some of the most important words and acronyms 
used in this manual: 

Architecture/implementation — The architecture is the set of operating principles defined in 
this manual. An implementation is any specific design that conforms to the architecture 
defined here. 

Current window — The block of 24 r registers currently pointed to by the CWP. 

Current Window Pointer (CWP) — Selects the current register window. 

Delay instruction — The instruction immediately following a control transfer. This instruction 
is always fetched, and is either executed or annulled before the control transfer takes place. 

Floating-Point Unit (FPU) — The coprocessor that performs floating-point calculations. 

Floating-Point Arithmetic Unit (FAU) — A subsection of the FPU that executes floating-point 
operate instmctions. 

Floating-Point Operate (FPop) instmction — An instmction that performs a floating-point cal- 
culation. They do not include loads and stores between memory and the FPU. 

Floating-Point Queue (FQ) — The queue where information about floating-point operate 
instmctions is held while they are being executed by the FPU. 

f register— One of the 32 FPU working registers. 

Global registers — A block of 8 registers that are available regardless of the value of the 
current window pointer. 

Integer Unit (lU) — The main computing engine. It fetches all instructions, and executes all 
but FPop and CPop instructions. 

Next Program Counter (nPC) — Contains the address of the instnjction to be executed next 
(assuming a trap does not occur). 

Processor — The combination of the lU and FPU. 

Processor State Register (PSR) — The lU's status register. 

Program Counter (PC) — Contains the address of the current instruction being executed by 
the lU. 

r register — A global register or a register in the lU's current window. 

rd, rsl and rs2— Fields in instmctions. These specify the register operands of an instmc- 
tion. rd is the destination register and rs1 and rs2 are the source registers. 
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itrd], rtrsi] and rlrs21 — The r registers specified by rd, rs1 and rs2. 
Word — A word is 32 bits. 

1.4.5. References 

For additional information about RISC architecture, see: 

"Reduced Instruction Set Computers", Communications of the ACM, Volume 28, Number 1, 
January, 1985 by Dave Patterson. 



1-4 Introduction 1-4 



SECTION 2: SPARC ARCHITECTURE OVERVIEW 



2.1. Introduction 

The SPARC architecture is used in 32-bit Reduced Instruction Set Computers (RISCs). It pro- 
vides an Integer Unit (ID) to perfonn basic processing and a Floating-Point Unit (FPU) to perform 
floating-point calculations concurrently with the lU. It also provides Instruction set support for an 
optional coprocessor. The details of the coprocessor itself are implementation-specific. 

A typical system that uses the SPARC architecture is organized around a 32-bit virtual address 
bus and a 32-bit instnjction/data bus. Its storage subsystem consists of a memory management 
unit (MMU) and a large cache for both instructions and data. The cache is virtual-address-based. 
Depending on the storage subsystem's interpretation of the processor's address space identifier 
(asi) bits, I/O registers are either addressed directly, bypassing the MMU, or they are mapped by 
the MMU into virtual addresses. 



2.2. lU, FPU, and OP 

The lU is the basic processing engine of the SPARC architecture. It executes all the instruction 
set except floating-point operate instructions and coprocessor instructions. A block diagram of 
the lU appears in Figure 2-1 . 

The FPU performs floating-point arithmetic using several floating-point arithmetic units (FAUs) to 
perform the actual calculations. The number of these units, which is implementation-dependent, 
determines the minimum number of floating-point operate instnjctions that can be executed at 
the same time. 

The FPU and the lU operate concurrently. The FPU recognizes floating-point operate instruc- 
tions and places them into a queue. Meanwhile, the lU continues to execute instnjctions. 
Floating-point operate instructions are executed from the queue when the specified floating-point 
registers are free and the required FAU is available. If the FPU encounters a floating-point 
operate instruction that doesn't fit in the queue, the lU stalls until the required FPU resource 
becomes available. 

Floating-point load/store instnjctions are used to move data between the FPU and memory. The 
lU generates a memory address and the FPU either sources or sinks the data. Note that 
floating-point loads and stores are not floating-point operate Instructions. 

The architecture hides floating-point concurrency from the programmer, so the implementation 
must provide the appropriate register interlocks. A program including floating-point computa- 
tions generates the same results as If all Instructions were executed sequentially. 

The architecture supports an optional coprocessor. Like the FPU, the coprocessor recognizes 
coprocessor arithmetic instructions, and executes them concurrently with instnjctions executed 
bythelU. 

Likewise, coprocessor load/store instnjctions are used to move data between the coprocessor 
and memory. For each floating-point load/store instruction, there is an analogous coprocessor 
load/store instnjction. 
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2.3. Registers 

The register structure forms an innportant part of the overall architecture. The lU's working regis- 
ters are divided into several windows, each with twenty-four 32-bit working registers, and each 
having access to the same eight 32-bit global registers. The cun^ent window pointer (CWP) field 
in the processor state register (PSR) keeps track of which window is cun-ently "active". 

In addition to the window registers and global registers, the SPARC architecture provides several 
control and status registers, and a non-windowed working register file for the FPU. 



2.4. Multitasking Support 

The SPARC architecture supports a multitasking operating system by providing user and supervi- 
sor modes. Some instructions are privileged, and can only be executed while the processor is in 
supervisor mode. Changing from user to supervisor nrKDde requires taking a hardware trap, or 
using a trap instnjction. 



2.5. Instruction Categories 

Instructions fall into six basic categories: 

1 Load and store 

2 Arithmetic/logical/shift 

3 Control-transfer 

4 Read/write control register 

5 Floating-point operate 

6 Coprocessor operate 

The following sections describe each briefly; for more detail, see the section Instructions. 

2.5.1. Load and Store Instructions 

Load and store instructions are the only instructions that access memory. They use two lU regis- 
ters or an lU register and a signed immediate value to calculate the memory address. The 
instruction's destination field specifies either an lU register, FPU register, or coprocessor register; 
this register supplies the data for a store, or receives the data from a load. 

Integer load and store instructions support byte, halfword (16-bit), word (32-bit), and doubleword 
(64-bit) accesses. Floating-point and coprocessor load and store instructions support word and 
doubleword memory accesses. Halfword accesses must be aligned on a 2-byte boundary, word 
accesses must be aligned on a 4-byte boundary, and doubleword accesses must be aligned on 
an 8-byte boundary. Improperly aligned addresses cause load or store instructions to trap. 

The order of bytes, hatfwords, and words appears in Figure 4-2. 

2J.2. Arithmetic/Logical/Shift 

These instructions (with one exception) compute a result that is a function of two source 
operands; they either write the result into a destination register or discard it. They perform arith- 
nr>etic, tagged arithmetic, fogical, or shift operations. The exception is a specialized instruction 
used to create 32-bit constants in two instructions. 
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Shift instructtons can be used to shift the contents of a register left or right, by a distance 
specified by the instmction or by an lU register. 

The tagged arithmetic instmctions assume that the least-significant two bits of the operands are 
tags and set a condition code bit rf they are not zero. 



2.5.3. Control-Transferlnstructions 

Control-transfer instructions include jumps, calls, traps, and branches. Control transfer is usually 
delayed so that the instmction immediately following the control transfer is executed before con- 
trol actually transfers to the target address. The instmction following the control-transfer instmc- 
tion is called a delay instruction. The delay instruction is always fetched, even when the control 
transfer is an unconditional branch. However, a bit in the control-transfer instmction can cause 
the delay instmction to be annulled (i.e. to have no effect) if the branch is not taken (or in one 
case, if the branch is taken). 

Branch and call instmctions use PC-relative displacements. The jump and link (JMPL) instmction 
uses a register-indirect displacement: it computes its target address as either the sum of two 
registers, or the sum of a register and a 13-bit signed immediate. The branch instmction provides 
a displacement of ± 8 Mbytes, while the call instruction's 30-bit word displacement allows a 
transfer to an arbitrary address. 



2.5.4. ReadAVrlte Control Register 

The SPARC architecture provides instmctions to read and write the contents of the various con- 
trol registers. For reads and writes, the source and destination (respectively) are implied by the 
instmction itself. 



2.5.5. Floating-point and Coprocessor Operate instructions 

Floating-point operate instmctions perform all floating-point calculations. These are register-to- 
register instmctions that use the floating-point registers. Like arithmetic/logical/shift instmctions, 
these also compute some result that is a function of two source operands. However, they always 
write the result into a destination register. 

Floating-point operate instmctions execute concurrently with lU instmctions and possibly with 
other floating-point instmctions. A particular floating-point operate instmction is specified by a 
subfield of the FPop instmctions. 

Coprocessor arithmetic instmctions are defined by the implemented coprocessor, if any. They 
are specified by the CPop instmction. The architecture supports 1 024 distinct coprocessor arith- 
metic instmctions. 

Floating-point loads and stores are NOT floating-point operate instructions (FPops), and copro- 
cessor loads and stores are NOT coprocessor operate instructions. Floating-point and coproces- 
sor loads and stores fall in the category "loads and stores ". 

Because the lU and the FPU can execute instmctions concurrently, when a floating-point excep- 
tfon occurs, the program counter usually does not contain the address of the floating-point 
instmction that caused the exception. However, the first element of the floating-point queue 
points to the instmction that caused the exception, and the remaining elements point to floating- 
point operate instmctions that have not yet completed. These can be re-executed or emulated. 

Likewise, if the coprocessor executes instmctions concurrently with the lU, the coprocessor can 
support a queue that, at the time of a coprocessor exception, will contain the instmction that 
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generated the exception and remaining, unexecuted coprocessor instructions. 

2.6. Processor Data Types 

The architecture defines nine data types; these appear in Figure 2-2. The integer types include 
byte, unsigned byte, halfword, unsigned halfword, word and unsigned word. The 
ANSI/IEEE 754-1985 floating-point types include single, double, and extended. A byte is 8 bits 
wide, a halfword is 16 bits, a word is 32 bits, a double is 64 bits, and an extended is 128 bits. 

The floating-point double type includes two subfields: 1) the double-e, which contains the sign, 
exponent, and high-order fraction, and 2) the double-f. which includes the low-order fraction. 
The floating-point extended type includes 4 subfields: 1) the extended-e, which contains the sign 
and exponent, 2) the extended-f, which contains the integer part of the mantissa, and the high- 
order part of the fraction, 3) the extended-f-Iow, which contains the low-order fraction, and 4) the 
extended-u which is unused. 

The following tables show a) the double and extended types in menx)ry, b) the single-, double-, 
and extended-precision fonnats. and c) the processor data types: 
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Byte 



Unsigned Byte 



Halfword 



s 


b 


7 6 







b 



Unsigned Haifword 



15 



Word 



31 30 



Double 
Double -e 

Double -f 



s 


h 


15 14 

^ord 







h 



23 22 





s 


w 


Unsigned Word 


31 30 
1 













W 


Single 


31 













s 


e 


f 



31 



Extended Precision 
Extended -e 



s 


e 


f-msb 


31 30 




20 19 









Msb 



S 


8 


unused/reserved 



31 X 



16 15 
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Extended -f 



f-msb 



31 30 



Extended -f low 



Msb 



31 



Extended -u 



unused/reserved 



31 



subfield 


address 


double-e 
double-f 


n 
n+4 


extended-e 
extended-f 
extended-f-low 
extended-u 


n 

n-Ht 
n-H8 
n+12 



8 = sign (1) 

e = biased exponent (8) 

f = fraction (23) 


normalized number ( < e < 255 ): 
subnormal number ( e = ): 
zero (e = 0): 


(.1)S*2"^^^*0.f 
(-1)^*0 


signaling NaN: 

quiet NaN: 
infinity: 


s = u; e = 255 (max); f = ,Ouuu— uu 

(at least one bit must be nonzero) 

s = u; e = 255 (max); f = Auuu— uu 

s = I/; e = 255 (max); f = .000— 00 

(all zeroes) 



s=:sign(1) 

e = biased exponent (11) 

f-msb — f-lsb = f = fraction (52) 


normalized number ( < e < 2047 ): 
subnormal number ( e = ): 
zero ( e = 0): 


(■i)%-2':;5^=-i.f 

(.1)S*2^^^^*0.f 
(-1)^*0 


signaling NaN: 

quiet NaN: 
infinity: 


s = I/; e = 2047 (max); f = .Ouuu— uu 

(at least one bit must be nonzero) 

s = i;; e = 2047 (max); f = Auuu— uu 

s = u; e = 2047 (max); f = .000— 00 

(all zeroes) 
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s = sign(1) 

e = biased exponent (15) 

j = integer part (1) 

f-msb — f-lsb = f = fraction (63) 


nomializecl number ( < e < 32767; j = 
subnormal number ( e = 0; j = ): 
zero (s = 0;e = 0): 


= 1 ):t 


/ ^\S * 2^" ■6383 * 1 £ 
/ ^ vS * p- 1 6383 * j £ 

(-1)^*0 


signaling NaN: 

quiet NaN: 
infinity: 




s = u: e = 32767 (max); j = i/; f = ,Quuu— uu 

(at least one bit must be nonzero) 

s = a; e = 32767 (max); j = a; f = Auuu— uu 

s = Lf; e = 32767 (max);j = u; f = .000— 00 

(all zeroes) 



2.7. Traps and Exceptions 

SPARC supports three types of traps: synchronous, floating-point/coprocessor and asyn- 
chronous (asynchronous traps are also called interrupts). 

Synchronous traps are caused by an instruction, and occur before the instmction is com- 
pleted. 

Floating-point/coprocessor traps are caused by a floating-point operate (FPop) or coproces- 
sor (CPop) instruction, and occur before the instmction is completed. However, due to the 
concurrent operation of the ID and the FPU, other non-floating-point instructions may have 
executed in the meantime. 

Asynchronous traps occur when an external event interrupts the processor; they are not 
related to any particular instruction and occur between the execution of instructions. 

Synchronous and floating-point/coprocessor traps are generally taken before the instruction 
changes any processor or system state visible to a programmer; they happen "between" instmc- 
tions. instructions which access memory twice (double loads and stores and atomic instructions) 
are the only exceptions. 

Traps transfer control to an offset within a table. The base address is specified in the trap base 
register (TBR), and the offset depends on the type of trap. Reset traps, however, cause the pro- 
cessor to transfer control to address 0. Because the program counters are not updated until after 
an instmction completes, the trap hardware captures both program counters and guarantees that 
the PC points to either the instruction that caused a synchronous trap, or to the instmction that 
was about to execute when a floating-point/coprocessor or asynchronous trap occurred. For 
floating-point/coprocessor traps, the instruction that caused the trap is in the floating-point queue 
(FQ) or the coprocessor queue (CP), and the PC will usually not point to it. 

Traps are described in the section Traps, Exceptions, and Error Handling. 



2.8. System Interface 

The SPARC architecture does not define many of the standard signals, such as bus grant and 
request lines, or acknowledges; these may differ among implementations. However, it does 
define the following signals, which are used by the instmction set: 



t The architecture ctoes not define or create results with < e < 32767, j = 0. 
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bpJRL<3:0> 

This external signal presents an asynchronous interrupt request to the processor. Level 
indicates that no interrupt is toeing requested, and levels 1 through 15 request interrupts, 
with level 15 having the highest priority. Level 15 is non-maskable unless all traps are dis- 
abled. The internjpt acknowledge signal is implementation-dependent. 

bp^resetjn 

This signal indicates that the external system is requesting a reset. The processor responds 
by entering reset_mode and clearing pb^error. 

pb_error 

The processor asserts this signal when it is in error_mode. 

pb_retain_bus 

The processor asserts this signal to ensure that the memory bus logic will not relinquish the 
bus. 

bp_FPU_present 

This signal indicates that the FPU is present. 

bp_CP_present 

This signal indicates that a coprocessor is present. 

bp_l_cache_present 

This signal indicates that there is an external instruction cache present. The IFLUSH 
instnjction uses this signal. 

bp_CP_exception 

The coprocessor asserts this signal in order to cause a cp_exception trap. An implementa- 
tion may delay the taking of the trap to the next CPop instruction. 

bp_CP_cc(1:0) 

The coprocessor supplies these condition codes for the coprocessor branch instmction 
(CBccc). 

bp_memory_access_exception 

The menrvDry system asserts this signal when the memory system is unable to provide the 
data at the requested address. The assertion of this signal will cause either an 
instruction_access_exception or a data_access_exception trap. 
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SECTION 3: REGISTERS 



3.1. Introduction 

The integer unit has two types of registers associated with it; working registers (r registers) and 
control/status registers. Working registers are used for normal operations, and control/status 
registers keep track of and control the state of the lU. The FPU has 32 working registers (called f 
registers), and two control/status registers: the Floating-point State Register (FSR), and the 
Floating-point Queue (FQ). 



3.2. Integer Unit r Registers 

All r registers are 32 bits wide. They are divided into 8 global registers and a number of blocks 
called windows. Each window contains 24 r registers. 

The number of windows (NWINDOWS) ranges from 2 to 32 depending on the implementation. 
Implemented windows must be contiguously numbered from to NWINDOWS -1 . 



3.2.1. Programming Note 

At most NWINDOWS -1 windows are available to user code since one window must be available 
for trap handlers. 

The windows are addressed by the CWP, a field of the Processor State Register (PSR). The 
CWP is incremented by a RESTORE or RETT instruction and decremented by a SAVE instruc- 
tion. The active window is defined as the window currently pointed to by the CWP. 

The Window Invalid Mask (WIM) is a register which, under software control, detects the 
occurrence of lU register file overflows and underflows. 

The registers in each window are divided into ins, outs, and locals. Note that the globals, while 
not really part of any particular window, can be addressed when any window is active. When 
any particular window is active, the registers are addressed as follows: 



Register numbers Name 



r[24]tor[31] ins 

r[16]tor[23] locals 

r[8] tor[15] outs 

r[0] to r[7] globals 



Each window shares its ins and outs with adjacent windows. The outs from a previous window 
(CWP +1) are the ins of the current window, and the outs from the current window are the ins for 
the next window (CWP -1). The globals are equally available from all windows, and the locals 
are unique to each window. 

The register addresses overlap such that, given a register with address o where 
8 < < 15, refers to exactly the same register as (o + 16) after the CWP is decremented by 1 
modulo NWINDOWS (points to the next window). Likewise, given a register with address / 
where 24 < / < 31, / refers to exactly the same register as address (/ - 16) after the CWP is 
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incremented by 1 modulo NWINDOWS (points to the previous window). 

The windows are joined together in a circular stack, where the highest numbered window is adja- 
cent to the lowest. If NWINDOWS = 8, the oafs of window 7 are the ins of window 0. Figures 3-1 
and 3-2 show the relationships. 

3.3. Special r registers 

The utilization of two r registers is partially fixed by the instruction set: 

If global register r[0] is addressed as a source operand (rs1 or rs2 = 0), the operand value 
is returned. If r[0] is addressed as a destination operand (rd = 0), no register is modified. 

The CALL instruction writes its own address into out register r[15]. 

Also note that traps save the program counters (PC and nPC) into two locals of the nexf window. 
This is described in the section Traps, Exceptions, and Error Handling. 



3.3.1. Programming Notes 

Because the processor logically provides new locals and outs after every procedure call, register 
local values need not be saved and restored across calls. The overlap registers also minimize 
the overhead of passing and returning values. They can be used as follows: 

In preparation for a procedure call, a routine generally moves the parameters into its out 
registers. After the CALL, the CWP is decremented with the SAVE instruction, what was the 
next window becomes the active window, and the parameters are directly accessible by the 
callee, since the caller's outs are the callee's ins. 

Likewise, in preparing for a procedure return, a routine generally moves its result(s) into its in 
registers. After the CWP is incremented via the RESTORE instruction, what was the previ- 
ous window becomes the active window, and the return values are accessible by the retur- 
nee, because the returner's ins are the returnee's outs. Note that the terms ins and outs are 
defined relative to calling, not returning. 

Since any implementation has only a finite number of windows, the register file becomes full after 
the number of procedure calls exceeds the number of returns by NWINDOWS - 1. A subse- 
quent call causes the operating system to move one or more (in and local sets of) windows from 
the register file into memory. The SAVE instruction automatically checks for the 
window_overflow condition. 

Similarly, the register file can become empty when the number of procedure returns exceeds the 
number of calls by NWINDOWS -1. A subsequent return causes one or more previously saved 
windows to be moved from memory into the register file. The RESTORE instruction automati- 
cally checks for the window_underflow condition. The architecture works best with efficient 
window_overtlow and window_underflow handlers. 

i:i ik i^ NOTE i:^ ik i^ 

By software convention, you can provide additional locals (and 
consequently, fewer ins and outs). For example, software can 
assume that the boundary is actually between r[26] and r[27], 
providing 6 outs, 10 locals, and 6 ins. 
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previous window 



rI31] 






: ins 






r{24] 






r[23] 




: locals 






r(16] 


active window 




rll5] 


r[31] 




: outs 


: ins 




18] 


r[24] 






r[23] 






locals 






r(l6] 


next window 




r[l5] 


rI31] 




outs 


: ins 




r[8] 


r[24] 






r[23] 






: locals 






r[16] 




r[15] 






: outs 






r[8] 




In this figure, NWINDOWS = 8. It does not show the 8 globals. If the procedure corresponding to 
the window labeled wO does a procedure call (executes a SAVE instmction), a window_overflow 
trap will occur. The overflow trap handler uses the locals of w7: 

CWP=0 active window = 
CWP+1 = 1 previous window = 1 
CWP-1 = 7 next window = 7 
WIM=1 OOOOOOOg trap window = 7 
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3.4. Integer Unit Control/Status Registers 

The lU's control/status registers are all 32-bit read/write registers unless specified otherwise. 
They include the program counters (PC and nPC), the Processor State Register (PSR), the Win- 
dow Invalid Mask register (WIM), the Trap Base Register (TBR), and the multiply-step (Y) regis- 
ter. 

ik ^ ^ NOTE A T> i> 

Control/status registers contain two types of fields, mode and 
status. Mode fields are set by the programmer; they appear in 
UPPER CASE (for example, PILj. Status fields appear in lower 
case italic font (for example, ver). 



3.4.1. Integer Program Counters (PC and nPC) 

The Program Counter (PC) contains the address of the instruction currently being executed by 
the lU, and the nPC holds the address of the next instruction to be executed (assuming a trap 
does not occur). 

In delayed control transfers, the instruction that immediately follows a control transfer may be 
executed before control is transferred to the target. The nPC is necessary to implement this 
feature. 



3.4.2. Processor State Register (PSR) 

This 32-bit register contains various fields describing the state of the lU. It can be modified by 
the SAVE, RESTORE, Ticc and RETT instructions, or by instructions that modify the condition 
codes. The (privileged) instructions RDPSR and WRPSR read and write it directly. 

The PSR provides the following fields: 



impi 


ver 


ice 


reserved 


EC 


EF 


PIL 


S PSET 


CWP 


31:28 


27:24 


23:20 


19:14 


13 


12 


11:8 


7 


6 


5 


4:0 



impI 

Bits 31 through 28 identify the implementation number of the processor. The WRPSR 
instruction does not modify this field. 

ver Bits 27 through 24 contain a constant: the meaning of this constant depends on the value of 
the /mp/field. The WRPSR instruction does not modify this field. 

ice Bits 23 through 20 contains the integer unit's condition codes. These bits are modified by 
the WRPSR instruction, and by arithmetic and logical instructions whose names end with the 
letters cc (for example, ANDcc). The Bice and Ticc instructions base their control transfer on 
these bits, which are defined as follows: 



n 


z 


V 


c 



23 22 



21 



20 



Negative (n) 

Bit 23 indicates whether the ALU result was negative for the last instruction that modified the 
/ccfield. 1 = negative, = not negative. 

Zero (z) 

Bit 22 indicates whether the ALU result was zero for the last instruction that modified the ice 
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field. 1 = result was zero, and = result was nonzero. 

Overflow (v^ 

If bit 21 is 1 . it indicates that an arithmetic overflow occurred during the last instruction that 
modified the ice field. If bit 21 is 0, this indicates that an arithmetic overflow did not occur. 
Logical instructions that modify the ice field always set the overflow bit to 0. 

Carry (c) 

If bit 20 is 1 , it indicates that either an arithmetic carry out of bit 31 occurred as the result of 
the last addition that modified the ice, or that a bon-ow into bit 31 occurred as the result of 
the last subtraction that modified the ice. If bit 20 is 0, this indicates that a carry did not 
occur. Logical instructions that modify the /cc field always set the carry bit to 0. 

reserved 

Bits 19 through 14 are reserved. This field should only be written to by the WRPSR 
instruction. 

EC This bit determines whether the coprocessor is enabled or disabled. 
1 = enabled, = disabled. 

EF This bit determines whether the FPU is enabled or disabled. 
1 = enabled, = disabled. 



3.4.3. Programming Note 

If the FPU is either disabled, or enabled and not present, an FPop, FBfcc, or floating-point 
load/store instruction causes an fp_disabled trap. Similarly, if the coprocessor is either dis- 
abled, or enabled and not present, a CPop, CBccc, or coprocessor load/store instruction 
causes a cp_disabled trap. 

When the FPU (or CP) is disabled, it retains its state until it is reenabled or reset. When dis- 
abled, the FPU can continue to execute instructions in its queue. The CP can also, if it has 

a queue. 

When the FPU is present, software can use the EF bit to determine whether a particular pro- 
cess uses the FPU. If a process does not use the FPU, the FPU's registers need not be 
saved and restored across context switches. Also, if the FPU is not present, (as indicated by 
the bp_FPU_present signal), the fp_disabled trap can be used to emulate the floating-point 
instruction set. (This also applies to the coprocessor.) 

PIL Bits 11 through 8 identify the processor interrupt level. The processor only accepts inter- 
rupts whose interrupt level is greater than the value in PIL. Bit 11 is the MSB and bit 8 is the 
LSB. 

S Bit 7 determines whether the processor is in supervisor mode: when S = 1, the processor is 
in supervisor mode. Note that because the instructions to write the PSR are only available in 
supervisor mode, supervisor mode can only be entered by a software or hardware trap. 

PS Bit 6 contains the value of the S bit at the time of the most recent trap. 

ET Bit 5 is the Trap Enable bit. When ET = 1, traps are enabled. When ET = 0, traps are dis- 
abled, and all asynchronous traps are ignored. Synchronous traps and floating- 
point/coprocessor traps cause the lU to halt and enter error_mode. (See Appendix C for a 
definition of error_mode.) 
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3.4.4. Programming Note 

If traps are enabled (ET=1). some care must be taken when you disable them (ET=0). Since 
the "RDPSR, WRPSR" instruction sequence is interruptible, it may not be appropriate in 
some situations. Here are two alternatives: 1) generate a 'Irapjnstruction" trap instead 
(this disables traps); or 2) use the "RDPSR, WRPSR" sequence and write the interrupt trap 
handlers so that before they return to the supervisor, they restore the PSR to the value it had 
when the interrupt handler was entered. Note that the PS bit cannot be restored. In alterna- 
tive (1), the trap handler should verify that it was called from the supervisor state before 
returning to the supervisor. 



CWP 



Bits 4 through comprise the Current Window Pointer, which points to the current active r 
register window. It is decremented by traps and the SAVE instruction, and incremented by 
RESTORE and RETT instructions. 

The CWP cannot point to an unimplemented window; therefore arithmetic on the CWP is 
done modulo the number of implemented windows (NWINDOWS). 



3.4.5. Window invalid Mask Register (WIM) 

This register is used to determine whether a window_overflow or window_underflow trap should 
be generated by a SAVE, RESTORE, or RETT instruction. Each bit in the WIM register 
corresponds to a window. For example, bit corresponds to window (CWP = 0). bit 1 
corresponds to window 1 (CWP = 1), and so on. If a SAVE, RESTORE, or RETT would cause 
the CWP to point to a window whose corresponding WIM bit equals 1, it causes a 
window_overflow (SAVE) or window_underf low (RESTORE, RETT) trap. 

This register can be read by the RDWIM instruction, and written by the WRWIM instruction. Bits 
corresponding to unimplemented windows read as zeroes and values written to unimplemented 
bits are ignored. 

The WIM provides the following fields: 



w31 


w30 


w29 




w2 


w1 


wO 



31 30 29 



3.4.6. Trap Base Register (TBR) 

The trap base register contains three fields that generate the address of the trap handler when a 
trap occurs. These are: 



TBA 


tt 


zero 



31:12 



11:4 



3:0 



TBA 



tt 



zero 



Bits 31 through 12 comprise the Trap Base Address (TBA), which is controlled by software. 
It contains the most-significant 20 bits of the trap table address. (Note that the reset trap is 
an exception; it traps to address 0). The TBA field can be written by the WRTBR instruction. 

Bits 1 1 through 4 comprise the Trap Type [tt) field. This is an 8-bit field that is written by the 
processor at the time of a trap, and retains its value until the next trap. It provides an offset 
into the trap table. The WRTBR instruction does not affect the /f field. 

Bits 3 through are zeroes. The WRTBR instruction does not affect this field. 
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For additional information, see the section Traps, Exceptions, and Error Handling. 



3.4.7. Y Register 

The multiply step instruction (MULScc) uses the 32-bit Y register to create 64-bit products. An 
example algorithm is described in Appendix B. 

This register can be read and written using the RDY and WRY instructions. 



3.5. Floating-Point Registers 

The floating-point unit has 32 working registers called f registers, a Floating-Point State Register 
(FSR) that contains mode and status information about the FPU, and a Floating-Point Queue 
(FQ) that holds one or more 64-bit instruction/address pairs. Software uses the FQ to recover 
from floating-point exceptions. 



3.5.1. Floating-Point f registers 

The 32-bit f registers are numbered from f[0] to f[31]. These can be read and written by floating- 
point operate (FPop and FPcmp) instructions, or by load/store single/double floating-point 
instructions (LDF, LDDF, STF, STDF). They are addressable at all times. 

A single f register can hold one single-precision operand. Double-precision operands require an / 
register pair, where the double-e datum occupies an even-numbered register, and the double-f 
datum occupies the following odd-numbered register. Extended-precision operands require an f 
register quad, with extended-e, extended-f, extended-f low, and extended-u in register addresses 
0, 1, 2, and 3 modulo 4, respectively. Thus, the frep/sfer file can hold 8 extended, 16 double, or 
32 single-precision operands. 



3.5.2. Floating-Point State Register (FSR) 

The FSR register fields contain FPU mode and status information. The fields are: 



RD 


RP 


TEM 


AU 


reserved 


ftt 


qne 


res 


fee 


aexc 


cexc 



31 :30 29:28 



27:23 



22 



21:17 



16:14 13 12 11:10 



9:5 



4:0 



Rounding Direction (RD) 

Bits 31 and 30 select the rounding direction for floating-point results, according to the 
ANSI/IEEE 754-1985 Standard: 



RD Round Toward: 




1 
2 

3 



Nearest (even, if a tie) 



+ 0O 

- oo 



Extended Rounding Precision (RP) 

Bits 28 and 29 determine the precision to which extended results are rounded, according to 
the ANSI/IEEE 754-1985 Standard: 
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RP Round to: 




1 
2 
3 



Extended 
Single 
Double 
(Unused) 



Trap Enable Mask (TEM) 

Bits 27 to 23 are enable bits for each of the five floating-point exceptions that can be indi- 
cated in the current_exception field [cexc), (See definition of cexc below.) If a floating-point 
operate instruction generates one or more exceptions and the TEM bit corresponding to one 
or more of the exceptions is set (1), an fp_exception trap is caused. A reset (0) TEM bit 
prevents that exception type from generating a trap. (See below.) The TEM field may be 
read and written by the STFSR and LDFSR instructions. 



NVM 


OFM 


UFM 


DZM 


NXM 



27 26 25 24 



23 



The TEM field may be read and written by the STFSR and LDFSR instructions. 

An implementation need not implement all of the TEM bits as defined above, except NXM, 
which must be implemented as described above. If a particular bit of the TEM field is not 
implemented according to the above definition, then it is implemented as a state bit instead. 
That is, if the particular bit is written to a value by a LDFSR instruction, that same value will 
be read by a subsequent STFSR instruction. 

Abrupt Underflow (AU) 

Bit 22, when set to 1, causes denormalized floating-point operands and/or results to be 
rounded to zero. The definition of AU mode is implementation-dependent and is not defined 
by the ANSI/IEEE 754-1985 Standard. 

Reserved 

Bits 21 through 17 and bit 12 are reserved. When read by an STFSR instruction, this field 
delivers all zeroes. This field should only be written to zero by the LDFSR instruction. 



Floating-Point Trap Type (W) 

Bits 16 through 14 identify fp_exception traps. After a floating-point exception trap occurs, 
the ftt field encodes the type of exception, fit remains valid until the next FPop instruction 
completes. (Note that the exception-causing FPop and its address are in the first entry of 
the Floating-point Queue — see below.) 

The Wfield can be read by the STFSR instruction. An LDFSR instruction does not affect ftt. 
This field encodes the exception types as follows: 
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ftt Trap Type 



None 

1 IEEE_exception 

2 unfinished_FPop 

3 unimplemented_FPop 

4 sequence_error 



An IEEE_exception indicates that an ANSI/IEEE 754-1985 exception occurred for the 
FPop identified by the front entry of the FQ. The exception type(s) is indicated in the 
cexc field. If the lEEE^exception results in a fp_exception trap (as determined by the 
TEM) then the destination (register, fee, and aexc fields remain unchanged. However, 
if the IEEE_exception does not result in a trap, then the f register, fee, and aexc fields 
are updated to their new values 

An unfinished_FPop indicates that an implementation's FPU was unable to generate 
correct results or exceptions, as defined by the ANSI/IEEE 754-1985 Standard. In this 
case, the cexc field is undefined. (However, the aexe and fee fields, and the destination f 
register are not affected by the exception.) 

An unimplemented_FPop indicates that an implementation's FPU decoded an FPop that 
it did not implement. In this case, the cexc field is undefined. (However, the aexe and 
fcc fields, and the destination f register are not affected by the exception.) 



3.5.3. Programming Note 

In the case of an unfinished__FPop or unimplemented_FPop, the software should emulate or 
reexecute the instructions in the FQ, and update the FSR and destination f register(s). 

A sequence_error indicates that an FPop or a load floating-point instruction is fetched while 
the FPU is in FPU_exception__mode, waiting for the FQ to be emptied by software. (See 
Appendix C). 



Queue Not Empty {qne) 

Bit 13 indicates whether the Floating-point Queue (FQ) is empty after an fp_exception trap or 
after a Store Double Floating-point Queue (STDFQ) instruction is executed. If qne = 0, the 
queue is empty: if qne = 1 , the queue is not empty. 

The qne bit can be read by the STFSR instruction. The LDFSR instruction does not affect 
qne. However, executing successive STDFQ instructions will (eventually) cause the FQ to 
become empty {qne = ). 



Floating-point Condition Codes (fee) 

Bits 11 and 10 contain the FPU condition codes. These bits are updated by floating-point 
compare instnjctions (FCMP and FCMPE) and are read and written by the STFSR and 
LDFSR instructions, respectively. Note that fee is updated even if FCMPE generates an 
IEEE_exception trap. 

In the following table, fs1 and fs2 correspond to the values in the f registers specified by an 
instruction's rs1 and rs2 fields. The question mark (?) indicates an unordered relation, which 
is true if either fs1 or fs2 is a signaling or quiet NaN (see the section Proeessor Data Types 
in the section SPARC Architeeture Oven/ieW). 
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The FBfcc instmction bases its control transfer on this field, which is interpreted as follows: 



fee Relation 



fsl = f s2 

1 fsl <fs2 

2 fs1 >fs2 

3 fs1 ? fs2 (unordered) 



Accrued Exception Bits (aexc) 

Bits 9 through 5 accumulate IEEE floating-point exceptions while fp_exception traps are dis- 
abled. After an FPop completes, the TEM and cexc fields are logically ancfti together. If the 
result is nonzero, an FP_exception trap is generated; othenArise, the new cexc field is or'd 
into the aexc field. Thus, while traps are masked, exceptions are accumulated in the aexc 
field. (See below). 



nva 


ofa 


Ufa 


dza 


nxa 



8 

The aexc field is read and written by the STFSR and LDFSR instructions. 

An implementation need not implement all of the aexc bits as defined above, except nxa, 
which must be implemented as described above. If a particular bit of the aexc field is not 
implemented according to the above definition, then it is implemented as a state bit instead. 
That is, if the particular bit is written to a value by a LDFSR instruction, that same value will 
be read by a subsequent STFSR instruction. 



Current Exception Bits (cexc) 

Bits 4 through indicate one or more IEEE exceptions that were generated by the most 
recently executed FPop instruction. The absence of an exception causes the corresponding 
bit to be cleared. 



nvc 


ofc 


ufc 


dzc 


nxc 



The cexc field is read and written by the STFSR and LDFSR instructions. 

An implementation need not implement all of the cexc bits as defined above, except nxc, 
which must be implemented as described above. If a particular bit of the cexc field is not 
implemented according to the above definition, then it is implemented as a state bit instead. 
That is, if the particular bit is written to a value by a LDFSR instruction, that same value will 
be read by a subsequent STFSR instruction. 

The cexc bits are not defined following an FPop that causes an unimplemented_FPop or 
unfinished_FPop fp_exception trap. Following an FPop that does not generate an 
fp_exception trap or that generates an IEEE_exception trap, the cexc bits are set as follows: 

nvc = 1 indicates invalid: an operand is improper for the operation to be performed. For 
example, 0/0, and « - oo are invalid. 

ofc = 1 indicates overflow: the rounded result would be larger in magnitude than the 
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largest normalized number in the specified format. 

ufc= 1 indicates underflow: the rounded result is inexact, and would be smaller in mag- 
nitude than the smallest normalized number in the indicated format. 

dzc = 1 indicates division-by-zero: X/0, where X is subnormal or normalized. Note that 
0/0 does not set the dzc bit. 

nxc = 1 indicates inexact: The rounded result differs from the infinitely precise correct 
result. 

The following illustration summarizes the handling of IEEE_exception traps. Note that the aexc 
and W fields can normally only be cleared by software. 



FRop generates an IEEE exception; 

cexc<r- IEEE exceptions generated by this FPop; 

if ( cexc and TEM ) = 

then ( aexcir- aexc or cexc, f[] <~ result; fcc<- fcc_result) 

else ( fn<r- IEEE_exception; cause fp_exception trap ) 



3.5.4. Programming Note 

Since the operating system must be capable of simulating the entire FPU in order to properly 
handle the unimplemented__FPop and unfinished_FPop floating-point exceptions, a user process 
always "sees" a fully implemented FSR as defined above. In other words, a user process always 
•'sees" cexc, aexc, and TEM fields that conform to the ANSI/IEEE 754-1985 Standard. 



3.5.5. Floating-Point Queue (FQ) 

The Floating-point Queue keeps track of FPops that are pending completion by the FPU when an 
fp_exception trap occurs. When an fp_exception trap occurs, the first entry in the queue gives 
the address of the FPop that caused the exception and the instruction itself. Any remaining 
entries in the queue contain FPop instructions (and their addresses) that had not finished when 
the exception occurred. 



3.5.6. Implementation Note 

If an implementation provides n entries in the queue, at most n FPops can execute simultane- 
ously in the FPU. For example, if the FPU provides one adder and one multiplier that can 
operate independently, then the FQ has no fewer than two entries. 
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SECTION 4: INSTRUCTIONS 



4.1. Introduction 

Functionally, SPARC architecture instnjctions fall into six categories: 1) load and store 2) 
arithmetic/logical/shift, 3) control transfer, 4) read/write control register, 5) floating-point operate, 
and 6) coprocessor operate. Instnjctions may also be classified into three major formats, two of 
which include subformats. 



4.2. Instruction Formats 

The three instnjction formats are called format 1, format 2, and format 3. Figure 4-1 shows each 
instruction format, with its fields and bit positions. It also lists the types of instnjctions that use 
that format: 

The fields in these instmctions have the following meanings: 

op This field places the instruction into one of the 3 major formats: 



Format op value Instnjction 



1 1 Call 

2 Bice, FBfcc, CBccc, SETHI 

3 2 or 3 other 



op2 This field comprises bits 24 through 22 of format 2 instnjctions. It selects the instruction as 
follows: 



op2 value 


Instnjction 





UNIMP 


2 


Bice 


4 


SETHI 


6 


FBfcc 


7 


CBccc 



rd For store instructions, this register selects an r register (or an f register paii), or an f register 
(or an f register pait) to be the source. For all other instmctions, this field selects an r regis- 
ter [or an f register paif), or an f register (or an f register pair) to be the destination. 

T> ^ i2r NOTE ii ^ i^t 

Reading t[0] produces the result 0, and writing it causes the 
result to be discarded. 

For more information on r registers, see the section Registers. 

a The "a" bit means "annul" in format 2 instructions. This bit changes the behavior of the 
instruction encountered immediately after a control transfer, as described later in this sec- 
tion. 
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cond 

This field selects the condition code for format 2 instructions. 

imm22 

This field is a 22-bit constant value used by the SETHI instruction. 

clisp22 and disp30 

These fields are 30-bit and 22-bit sign-extended word displacements, for PC-relative calls 
and branches, respectively. 

op3 The op3 field selects one of the format 3 opcodes. 

/ The / bit selects the type of the second ALU operand for non-FPop instaictions. If / = 0, the 
second operand is ilrs2]. If / = 1 , the second operand is sign-extended simm13. 

asi This 8-bit field is the address space identifier generated by load/store alternate instructions. 
See discussion below. 

rs1 This 5-bit field selects the first source operand from either the r registers or the f registers. 

rs2 This 5-bit field selects the second source operand from either the r registers or the f regis- 
ters. 

si mm 13 

This field is a sign-extended 13-bit immediate value used as the second ALU operand when / 
= 1. 

opf 

This 9-bit field identifies a floating-point operate (FPop) instmction or a coprocessor operate 
(CPop) instruction. Note that it uses the synonym ope for coprocessor operate instmctions 
(see the coprocessor operate instructions in Appendix B). A table in Appendix F shows the 
relationship between the opf field and FPop instructions. 



4.3. Load/Store Instructions 

Load and store instmctions are the only instmctions that access memory and registers external 
to the processor. They generate a 32-bit byte address. In addition to the address, the processor 
always generates an address space identifier, or asi. 



4.3.1. Address Space Identifier 

The address space identifier generated by the processor is made available to the external sys- 
tem to distinguish up to 256 address spaces. These spaces can include system control registers, 
main memory, etc. The number of defined spaces is implementation-dependent. 

The SPARC architecture defines four address spaces and their as/ values; these appear in Table 
4-3. They indicate to the external system whether the processor is in user or supervisor mode 
(as indicated by the PSR), and whether the access is an Instruction or a data reference. 



asi 


Assignment 


0-7 


Implementation-definable 


8 


User instmction space 


9 


Supervisor instruction space 


10 


User data space 


11 


Supervisor data space 


12-255 


Implementation-definable 
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Load/store instructions normally generate an as/ of either 10 or 11 for the data access, depend- 
ing on whether the processor is in user or supervisor mode. However, the load from alternate 
space and store into alternate space instructions use the asi field supplied by the instruction 
itself. 

Note that the load/store alternate instnjctions are privileged; they can only be executed in super- 
visor mode. 



4.3.2. Addressing Conventions 

The load and store instnjctions use the following addressing conventions: 

Bytes 

For load and store byte instructions, increasing the address generally means decreasing the 
significance of the byte within a word: the most significant byte (MSB) of a word is accessed 
when address bits <1:0> are and the least significant byte (LSB) is accessed when 
address<1:0> = 3. 

Halfwords 

For load and store halfword instructions, when address bit 1 = 1, the least significant half- 
word of a word is accessed, and when address bit 1 = 0, the most significant halfword is 
accessed. 

Doublewords 

For load and store double instnjctions, the most significant word is accessed when address 
bit 2 = 0, and the least significant word is accessed when address bit 2 = 1. 

In general, the address of a doubleword, word, or halfword is the address of its most significant 
byte. These conventions are illustrated in the following figure: 
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address<1 :0> 


Bytes 
1 2 3 




7 


7 


7 


7 


MSB LSB 

Halfwords 
address<1:0> 2 




15 





15 






Word 




31 





address<2> 


Doubleword 




1 


63 


32 


31 












A doubleword-aligned datum is located at a doubleword address, which must be evenly divisible 
by 8. A word-aligned datum is located at a word address, which must be evenly divisible by 4. A 
halfword-aligned datum is located at a halfword address, which must be divisible by 2. 

If a doubleword, word, or halfword load or store instruction generates an improperly aligned 
address, a memory_address_not_aligned trap occurs. 



4.4. Arithmetic, Logical, and Shift Instructions 

All of these instructions compute some result that is a function of two source operands, and 
either write the result into a destination r register (r[rd]) or discard it. One of the operands is 
always r[rs1]. The other operand depends on the /bit in the instruction: if / = 0, the operand is 
r[rs2l but if /= 1 , the operand is the sign-extended constant sign_extend(simm13). 

Reading r[0] produces the value zero. If the destination field indicates a write into r[0], no r regis- 
ter \s modified and the result is discarded. 

Most of these instructions have dual versions which modify the integer condition codes (ice) as a 
side effect. 
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4.4.1. Programming Note 



itO] can be used to implement a register-to-register move in one of several ways: ADD with 0, OR 
with 0, etc. Subtract and set condition codes (SUBcc) can be used as an integer COMPARE 
instruction. 

The tagged add and subtract instnjctions (TADDcc, TSUBcc, TADDccTV and TSUBccTV) 
operate on tagged data where the tag is the low-order two bits of the data. If either of the 
instruction's two operands has a nonzero tag, the overflow bit of the PSR is set. The "trap on 
overflow" versions, TADDccTV and TSUBccTV, in addition to writing the condition codes, also 
cause an overflow trap. 



4.4.2. Programming Note 

One possible model for tagging is to use to tag integers and 3 for pointers to doublewords, i.e. 
list cells. 

If trapping overhead is insignificant, then TADDccTV or TSUBccTV is faster than the non- 
trapping versions, which would need to be followed by 'branch on overflow' instructions. 

Suppose p contains a tagged pointer to a list cell, i.e. has 3 in its low-order two bits. Since the 
load and store instructions execute successfully only with properly aligned addresses, a load or 
store word with an address specifier of "p - 3" or "p -f t" will succeed, accessing the first or 
second word of the list cell, respectively; if, on the other hand, p contains a tag value other than 
3, they will trap. 

Shift instructions shift an r register \ef{ or right by a constant or variable amount, as described in 
Appendix B. None of the shift instructions changes the condition codes. 

The "set high 22 bits of r" (SETHI) instruction writes a 22-bit constant from the instruction into the 
high-order bits of the destination register. It clears the low-order 10 bits, and does not change 
the condition codes. 

4.4.3. Programming Note 

SETHI can be used to construct a 32-bit constant using two instructions. 

4.5. Control Transfer instructions 

Control-transfer instructions change the values of PC and nPC. There are five types of control 
transfer instructions: 

1) Conditional branch (Bice, FBfcc, CBccc) 

2) Jump and Link (JMPL) 

3) Call (CALL) 

4) Trap (Ticc) 

5) Return from trap (RETT) 

Each of these can be further categorized according to whether it is 1) PC-relative or register- 
indirect, or 2) delayed or non-delayed. The following matrix shows these characteristics: 



4-5 Instructions 4-5 



Solbourne Computer, Inc. 



Instruction 



PC-relative or Delayed 
Register-indirect 



Bice, FBfcc, CBccc, CALL 

JMPL. RETT 

Ticc 



PC-Relative 
Reg-Indirect 
Reg-Indirect 



Yes 
Yes 
No 



The following paragraphs describe each of the characteristics: 

PC-relative 

A PC-relative control transfer computes its target address by adding the (shifted) sign- 
extended immediate displacement to the program counter (PC). 

Register-indirect 

A register-indirect instruction computes its target address as either "r^rsl] + r[rs2]" if /= 0, or 
"r{rs1] + sign_ext(simm13)" if /= 1. 

Delayed 

A control transfer instruction is delayed if it transfers control to the target address after a 
one-instruction delay. Delayed control transfers are described in the next section. 

4.5.1. Delayed Control Transfers 

Traditional architectures usually execute the target of a control transfer instruction immediately 
after the control-transfer instruction. The SPARC architecture delays by one instruction the exe- 
cution of the target of a delayed control-transfer instruction. The instruction encountered immedi- 
ately after a delayed control transfer is called the delay instruction. 



4.5.2. PCandnPC 

In general, the PC points to the instruction being executed by the lU, and the nPC points to the 
instruction to be executed next. Most instructions complete by copying the contents of the nPC 
into the PC, then either increment nPC by 4, or, if the instruction implies a control transfer, write 
the computed target address into nPC. The PC now points to the instruction that will be exe- 
cuted next, and the nPC points to the instruction that will be executed after the next one; in other 
words, two instructions hence. 



The sequence is: 



PC < 
nPC 



nPC 
- nPC + 4 



or target address 



4.5.3. Delay Instruction 

The instruction pointed to by the nPC when a delayed control-transfer instruction is encountered 
is called the delay instruction. Normally, this is the next sequential instruction in the code space. 
However, if the instruction that preceded the delayed control transfer was itself a delayed control 
transfer, the address of the delay instruction is the target of the (first) control-transfer instruction, 
since that is where the nPC will point. This behavior is explained further in the section Back-to- 
Back Delayed Control Transfers below. 

The following example shows the order of execution for a simple (not back-to-back) delayed con- 
trol transfer. The order of execution is 8, 12, 16, 40. If the delayed control transfer-instruction 
were not taken, the order would be 8,12,1 6, 20. 



4-6 



Instructions 



4-6 



Solbourne Computer, Inc. 





PC 

before 

instruction 


nPC 

before 

instruction 


Instruction 




8 

12 

16 


12 

16 
40 


Non-control transfer 
Control transfer (target = 40) 
Non-control transfer (delay instruction) 

Transfers control to 40 




40 


44 


... 



4.5.4. Annul Bit 

The a (annul) bit changes the behavior of the delay instruction. This bit is only available on con- 
ditional branch instructions (Bice, FBfcc and CBccc). If a is set on a conditional branch (except 
BA, FBA and CBA) and the branch is not taken, the delay instruction is "annulled" (not exe- 
cuted). An annulled instruction has no effect on the state of the lU nor can a trap occur during an 
annulled instruction. If the branch is taken, the a bit is ignored and the delay instruction is exe- 
cuted. For example: 



PC 


nPC 


Instruction 


Action 


8 


12 


Non-control transfer 


Executed 


12 


16 


Bicc(a=1)40 


Not taken 


16 


40 


Non-control transfer 


Annulled (not executed) 


20 


24 


... 


Executed 



PC 


nPC 


Instruction 


Action 


8 


12 


Non-control transfer 


Executed 


12 


16 


Bice (a=0) 40 


Not taken 


16 


40 


... 


Executed 


40 


44 




Executed 



BA, FBA and CBA instructions are a special case; if the a bit is set in these instructions the delay 
instruction is not executed if the branch is taken, but it is executed if the branch is not taken. 

The following display shows the effect of the a bit on the delay instruction after various kinds of 
branches: 



a bit 


Type of branch 


Delay instr. executed? 


a = 1 


Always 


No 




Conditional, taken 


Yes 




Conditional, not taken 


No 


a = 


Always 


Yes 




Conditional, taken 


Yes 




Conditional, non taken 


Yes 
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4.5.5. Programming Notes 

The annul bit increases the likelihood that a compiler or optimizer can place a useful instruction in 
the delay slot after a branch. Refer to the following table: 



Address Instruction 



Target 



L non-control transfer instruction 

L* 



Bice 
NOP 



If the Bice has a = 0, a code optimizer may be able to move a non-control-transfer instruction 
from within the loop into location D. If the Bice has a = 1, then the compiler can copy the non- 
control-transfer instruction at location L into location D, and change the branch to Bice L*. 

The annul bit can also be used to optimize "if-then-else" statements. Since the conditional 
branch instructions provide both true and false tests for all the conditions, an optimizer can 
arrange the code so that a non-control-transfer instruction from either the "else" branch or the 
"then" branch can be moved into the delay position after the branch instruction. For example: 



Address 


Instruction 


1 Address 


Instruction 




Bicc(cond. a=1)THEN 




Bicc(cond. a=1)ELSE 


Delay: 


then-phrase-lnstr-1 


1 Delay: 


else-phrase-instr-1 




else-phrase-instr-1 




then-phrase-instr-1 




else-phrase-instr-2 




then-phrase-instr-2 




goto ... 




goto ... 


THEN: 


then-phrase-instr-2 


1 ELSE: 


else-phrase-instr-2 




then-phrase-instr-3 




else-phrase-instr-3 



When set in a branch always instnjction (BA, BFA), the annul bit implements a "traditional," non- 
delayed branch instruction. This can also be used to dynamically replace unimplemented 
instructions with branches to software emulation routines as this requires less overhead than a 
trap. 



4.5.6. Calls and Returns 

A procedure that requires a register window is invoked by executing both a CALL (or a JMPL) 
and a SAVE instruction. A procedure that does not need a register window, a so-called "leaf" 
routine, is invoked by executing only a CALL (or a JMPL). Leaf routines can use only the out 
registers. 

The CALL instruction stores PC, which points to the CALL itself, into register r[15] (an ouf regis- 
ter). JMPL stores PC, which points to the JMPL instruction, into the specified r register. These 
instructions then cause a transfer of control to a target that can be arbitrarily distant. 

The SAVE instruction is similar to an ADD instruction, except that it also decrements the CWP by 
one, causing the active window to become the previous window, thereby "saving" the caller's 
window. Also, the source registers for the addition are from the previous window while the result 
is written into the new window. 
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A procedure that uses a register window returns by executing both a RESTORE and a JMPL 
instruction. A leaf procedure returns by executing a JMPL only. The JMPL instruction typically 
returns to the instruction following the CALL'S or JMPL's delay instruction; in other words, the typ- 
ical return address is 8 plus the address saved by the CALL. \ 

The RESTORE instruction, also like an ADD instruction, increments the CWP by one, causing 
the previous window to become the active window, thereby "restoring" the caller's window. Also, 
the source registers for the addition are from the cun'ent window while the result is written into 
the previous window. 

Both SAVE and RESTORE compare the new CWP against the Window Invalid Mask (WIM) to 
check for window overflow or underflow. 



4.5.7. Programming Note 

The SAVE and RESTORE instructions can be used to atomically update the CWP while estab- 
lishing a new memory stack pointer in an r register. 



4.5.8. Trap (TIcc) Instruction 

The Ticc instruction evaluates the condition codes specified by its contf field, and if the result is 
true, it causes a trap with no delay instruction. If the condition codes evaluate to false, it exe- 
cutes as a NOP. 

A taken Ticc identifies the software trap by writing 'Irap^number + 128" into the tt field of the 
TBR. The processor enters supervisor mode, disables traps, decrements the CWP, and saves 
PC and nPC into the locals r[17] and r[18] (respectively) of the new window. 



4.5.9. Programming Note 

Ticc can be used to implement kernel calls, breakpointing, and tracing. It can also be used for 
njn-time checks, such as out-of-range array indices, integer overflow, etc. 

4.5.10. Delayed Control Transfers Couples 

When a delayed control transfer is encountered immediately after another delayed control 
transfer, this creates what is called a delayed control-transfer couple, which the processor han- 
dles differently from a simple control transfer. 

The following tables show, first, a sequence of instructions that includes a delayed control- 
transfer couple, and second, a table that illustrates the order of execution depending on the 
nature of the control-transfer instructions. 
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^ ^ ik NOTE ^ ^ ik 

in the following tables, 'delayed control-transfer instruction' is 
abbreviated to 'DCTI'. Note that a "non-DCTI" may be either a 
non-controi-transfer instruction, or a control-transfer instruction 
which is not delayed (i.e. a Ticc). 



address: 


instoidion 


target 


8: 


non-DCTI 




12: 


DCTI 


40 


16: 


DCTI 


60 


20: 


non-DCTI 




24: 


... 




40: 


non-DCTI 




44: 






60: 


non-DCTI 




64: 







Case 


12: DCTI 40 


16: any DCTI 60 


Order of Execution: 


1 


DCTI unconditional 


DCTI taken 


12.16,40,60.64,... 


2 


DCTI unconditional 


B*cc(a=0) untaken 


12.16,40.44.... 


3 


DCTI unconditional 


B*cc(a=l) untaken 


12, 16, 44, 48, ...(40 annulled) 


4 


DCTI unconditional 


B*A(a=1) 


12, 16. 60. 64, ... (40 annulled) 


5 


B*A(a=1) 


any CTI 


12, 40, 44. ...(16 annulled) 


6 


B*cc 


DCTI 


not supported (see text) 



Where the annul bit is not indicated, it may be either or 1. Abbreviations are as follows: 



B*A 

B*cc 

DCTI unconditional 

DCTI taken 



BA, FBA or CBA 

Bice, FBfcc, or CBccc (except B*A) 

CALL, JMPL, RETT, or B*A(a=0) 

CALL, JMPL, RETT. B*cc taken, or B*A(a=0) 



When the first instruction of a delayed control-transfer couple is a conditional branch, the transfer 
of control is undefined (case 6). If such a couple is executed, the location where execution con- 
tinues is within the same address space but otherwise undefined. This sequence does not 
change any other aspect of the processor state. 

Case 1 of the above table includes the "JMPL, RETT" couple. RETT must always be preceded 
by a JMPL instruction. (If it is not, the location where execution continues is not necessarily 
within the address space implied by the PS bit of the PSR.) 



4.5.11. Programming Note 

Trap handlers complete execution by executing the "JMPL, RETT" couple. 
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4.6. Read and Write Control Registers 

These instructions read or write the contents of the programmer-visible control registers. This 
category includes instructions to read and write the PSR, the WIM, the TBR, the Y register, the 
FSR, and the CSR. These instructions are all privileged (available in supervisor state only), 
except those that read and write the Y register, the FSR, and the CSR. 



4.7. Floating Point Operate (FPop) Instructions 

Floating-point operate instructions (FPops) are generally three-register instructions that compute 
some result that is a function of one or two source operands, and place the result in a destination 
f register. The exception is floating-point compare operations, which update the ice field of the 
FSR. 

The term "FPop" does NOT include the load/store floating-point instructions. 

Multiple-precision instructions assume that their operands are in multiple contiguous f registers. 
The operands must be aligned in the f registers according to their size: the number of the first f 
register o\ a multiprecision operand must be a multiple of the operand size in words. 

All FPops except move instructions can modify the status fields of the FSR. 

FPops execute concurrently with lU instructions and other FPops. Concurrent operation is 
described in the section SPARC architecture Overview ^r\6 in Appendix C. 

There are no direct lU-to-FPU or FPU-to-IU move instructions. 



4.8. Coprocessor Operate (CPop) Instructions 

The coprocessor operate instructions are executed by the attached coprocessor. If there is no 
attached coprocessor, a CPop instruction generates a cp_disabled trap. 

The instruction fields of a CPop instruction, except for op and op3, are interpreted only by the 
coprocessor. A CPop takes all operands from and returns all results to coprocessor registers. 

Note that the term "CPop" does NOT include the load/store coprocessor Instructions. 
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SECTION 5: TRAPS, EXCEPTIONS, AND ERROR HANDLING 



5.1. Introduction 

SPARC supports three types of traps: synchronous, floating-point/coprocessor and asyn- 
chronous (asynchronous traps are also called Interrupts). Synchronous traps are caused by an 
instruction, and occur before the instruction is completed. Floating-point/coprocessor traps are 
caused by a Floating-Point Operate (FPop) or coprocessor (CPop) instruction, and occur before 
the instnjction is completed. However, due to the concurrent operation of the lU and the FPU, 
other non-floating-point instnjctions may have executed in the meantime. Asynchronous traps 
occur when an external event interrupts the processor. They are not related to any particular 
instruction and occur between the execution of instructions. 

Synchronous and floating-point/coprocessor traps are generally taken before the instnjction 
changes any processor or system state visible to a programmer; they happen "between" instruc- 
tions. Instnjctions which access memory twice (double loads and stores and atomic instnjctions) 
are the only exceptions. 

An instnjction is defined to be trapped if any trap occurs during the course of its execution. If 
multiple traps occur during one instnjction, the highest priority trap is taken. Lower priority traps 
are ignored because the traps are arranged under the assumption that the lower priority traps 
persist, recur, or are meaningless due to the presence of the higher priority trap. For example, if 
a mem_address_not_aligned trap is detected during an instmction fetch, the potential 
unimplementedjnstmction trap is meaningless because the address is invalid. Pending inter- 
rupts persist: therefore, they have the lowest priority. 

The ET bit in the PSR must be set for traps to occur normally. If a synchronous trap occurs while 
traps are disabled the processor halts and enters an error state. In most implementations, this 
causes a reset trap. 



5.1.1. Implementation Note 

Since intermpts are ignored while traps are disabled, they should persist until they are ack- 
nowledged. 

Load/store instructions generally trap before the instnjction changes the state of the processor. 
However, those instructions that do more than one memory access (namely the load and store 
doubles and the atomic load and store instructions) may trap on a data_access_exception after 
the first memory access, causing a trap after the processor state has been partially modified. 
This can only occur for non-resumable exceptions, such as uncorrectable memory errors. (See 
Appendix Bior instruction descriptions.) 



5.2. Trap Addressing 

The Trap Base Register (TBR) generates the exact address of a trap handling routine. When a 
trap (other than some types of reset trap) occurs, the hardware writes a value into the trap type 
(fO field of the TBR. This uniquely identifies the trap and serves as an offset into the table whose 
starting address is given by the TBA field of the TBR. 
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The 8-brt wide ff field allows for 256 distinct types of traps. Half of these (0 - 127) are dedicated 
to hardware traps, and half (128-255) are dedicated to programmer-initiated traps (see the Ticc 
instnjction). The tt field remains valid until another trap occurs. 



5.3. Trap Priorities 

The following table shows the trap types, priorities, and assignments. 



Trap 


Priority 


tt 


reset 


1 


— 


instruction_access_exception 


2 


1 


illegaljnslruction 


3 


2 


privilegedjnstruction 


4 


3 


fp_disabled 


5 


4 


cp_disabled 


5 


36 


window_overtlow 


6 


5 


window_underflow 


7 


6 


mem_address_not_aligned 


8 


7 


fp_exception 


9 


8 


cp_exception 


9 


40 


data_access_exception 


10 


9 


tag_overflow 


11 


10 


trapjnstruction (Ticc) 


12 


128-255 


interruptjevel_15 


13 


31 


interruptjevel_14 


14 


30 


interruptJeveM3 


15 


29 


interrupt_level_12 


16 


28 


interruptJeveL1 1 


17 


27 


interruptJeveMO 


18 


26 


internjptjevel_9 


19 


25 


interruptJeveLS 


20 


24 


interrupt_levei_7 


21 


23 


intemjptjevel_6 


22 


22 


interrupt_level_5 


23 


21 


interruptjevel_4 


24 


20 


interruptjevel_3 


25 


19 


interruptjevel_2 


26 


18 


interruptjevel_1 


27 


17 



5.4. Trap Definition 

A trap causes the following action: 

It disables traps (ET ^ 0). 

It copies the S field of the PSR into the PS field and then sets the S field to 1 . 

It decrements the CWP by 1 , modulo the number of implemented windows. 

It saves the PC and nPC into rtl 7] and rllB], respectively, of the new window. 

It sets the ff field of the TBR to the appropriate value. 

If the trap is not a reset, it writes the PC with the contents of TBR, and the nPC with the con- 
tents of TBR + 4. If the trap is a reset, it loads the PC with and the nPC with 4. 
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it ^ ii NOTE ii it it 

Unlike many other processors, the SPARC architecture does not 
automatically save the PSR into memory during a trap. Instead, 
it saves the volatile S field into the PSR itself and the remaining 
fields are either altered in a reversible way (ET and CWP). or 
should not be altered in a trap handler until the PSR has been 
saved into memory. 

The last two instmctions of a trap handler should t>e a JMPL fol- 
lowed by a RETT. This restores the PC, the nPC and the S bit 
of the PSR. 

Because the FPU and lU operate concurrently, the address that is saved from the PC as a result 
of a floating-point exception may not be the address of the FPop that caused the exception. If a 
floating-point exception occurs, the first element in the FQ points to the FPop that caused the 
exception, and the remaining elements point to FPops that have been started by the FPU but 
have not yet completed. These can be re-executed or emulated. 

For additional information on trap handlers, see Appendix C. 



5.5. Interrupt Detection 

As long as ET = 1, the lU checks for interrupts. It compares the external internjpt level [bpJRL) 
against the PIL field of the PSR, and if bpJRL is greater than the PIL, or if bpJRL is 15 
(unmaskable), then a trap occurs at the level requested by bpJRL 



5.5.1. Implementation Note 

Processor implementations may ignore interrupts for multiple cycles even though ET=1 . 



5.6. Floating-point/Coprocessor Exception Traps 

Floating-point/coprocessor exception traps are considered a separate class of traps because 
they are both synchronous and asynchronous. They are asynchronous because they occur 
sometime after the floating-point or coprocessor instmction that caused the exception. However, 
they are synchronous because a floating-point or coprocessor instruction must be encountered in 
the instmction stream before the trap is taken. 

When the FPU or CP recognizes an exception condition, it enters an "exceptionjDending^mode" 
state, and remains in this state until the lU takes the fp_exception trap. When the lU takes the 
exception trap, the FPU leaves "exception_pending" state, and enters "exception_mode" state. 
The FPU or coprocessor remains in the exception_mode state until the floating-point or copro- 
cessor queue has been emptied by execution of one or more STDFQ or STDCQ instmctions. 

The PC that corresponds to a floating-point or coprocessor e:i:ception always points to a floating- 
point or coprocessor instruction. However, the exception itself is always due to a previously exe- 
cuted floating-point or coprocessor instmction. The instmction and the value of the PC from 
which it was fetched are in the floating-point (or coprocessor) queue. 



5.7. Trap Descriptions 

The following paragraphs describe the various traps, and the conditions that cause them. 
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reset 

A reset trap occurs when the lU leaves reset_mode and enters execute_mode. This is con- 
trolled by the bp_resetjn signal. The lU enters reset_nfX)de when bp_resetjn « 1, and 
enters execute mode when bp_resetjn « 0. Except in one situation, reset does not change 
the value of the ff field of the TBR; the exception is when a retum from trap instruction is 
executed while traps are not enabled and the processor is not in supervisor r7X)de (see 
description of retum from trap instruction in Appendix B), Also, a reset trap causes the lU to 
begin execution at location 0. regardless of the value of the TBR. 

Reset traps set the PSR S bit to 1 and the ET bit to 0. All other PSR fields, and all other 
registers retain their values from the last execute_mode, except that on power-up they are 
undefined. 

instruct ion_access_exception 

This trap occurs when bp_memory_access_exception = 1 for a memory address used in an 
instruction fetch. 

illegaljnstaiction 

This trap occurs 1) when the UNIMP instruction is encountered, 2) when an unimplemented 
instruction which is not an FPop or a CPop is encountered, or 3) when an instruction is 
fetched which, if executed, would result in an illegal processor state (e.g. writing an illegal 
CWP into the PSR). Unimplemented floating point operate and unimplemented coprocessor 
operate instmctions generate fp_exceptions and cp_exception traps, respectively. 

privilegedjnstruction 

This trap occurs when a privileged instmction is encountered while the S bit in the PSR = 0. 

fp_disabled 

This trap occurs when a FPop, FBfcc, or a floating-point load or store is encountered while 
the EF bit in the PSR = or no FPU is present. 

cp_disabled 

This trap occurs when a CPop, CBccc, or a coprocessor load or store instaiction is decoded 
while the EC bit in the PSR = or no coprocessor is present. 

window_overtlow 

This trap occurs when a SAVE instruction would, if executed, cause the CWP to point to a 
window marked invalid in the WIM. 

window_underflow 

This trap occurs when a RESTORE instruction would, if executed, cause the CWP to point to 
a window marked invalid in the WIM. 

mem_address_not_aligned 

This trap occurs when a load, store or JMPL instmction would, if executed, generate a 
memory address or a new PC value that is not properly aligned. 

fp_exception 

This trap occurs when the FPU is in exception^pending state and a floating-point instuction 
(FP operate, floating-point load/store, FBfcc) is encountered in the instruction stream. The 
type of exception is encoded in the ff field of the FSR as described in the section Registers. 

cp_exception 

This trap occurs when the CP is in exceptionjDending state and a coprocessor instuction 
(CP operate, coprocessor load/store, CBccc) is encountered in the instruction stream. 

data_access_exception 

This trap occurs when bp_menfX)ry_exception=l for a memory address that corresponds to a 
data movement by a load or store instaiction. 
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tag_overflow 

This trap occurs when a TADDccTV or TSUBccTV instruction is executed which causes the 
overflow bit of the integer condition codes to be set. 

trapjnstruction 

This trap occurs when a taken Ticc instruction is executed. 

interajptjevel<3:0> 

External interrupts are controlled by the value of bpJRL A value of indicates that no 
internjpt is requested. Level 1 is the lowest priority interrupt and 15 is the highest. Interrupt 
level 15 cannot be masked by the Processor Interrupt Level (PIL) field of the PSR. When ET 
= 1, an external intermpt is recognized if bpJRL = 15 or bpJRL > PIL. When ET = or 
(bpJRL ^ 15 and bpJRL < PIL), no external interrupt is recognized. 
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A.1. Introduction 

This appendix supports Appendix S, Instruction Descriptions, Every instruction description in 
Appendix B includes a table that describes the suggested assembly language format for that 
instruction. This appendix describes the notation used in the assembly language syntax descrip- 
tions. 

Understanding the use of type fonts is crucial to understanding the syntax descriptions in 

Appendix B. Items in typewriter font are literals, to be entered exactly as they appear. Items in 
italic font are metasymbols which are to be replaced by numeric or symbolic values when actual 
SPARC assembly-language code is written. For example, ''asr would be replaced by a number 
in the range of to 255 (the value of the asi bits in the binary instruction), or by a symbol which 
had been bound to such a number. 

Subscripts on metasymbols further identify the placement of the operand in the generated binary 
instoiction. For example, i^^g^^p '^ ^ ''^^ ^'•^" ^^9'^*^^ name) whose binary value will end up in the 
rs2 field of the resulting instrudion. 

Register Names 

reg A reg is an Integer Unit register. It can have a value of: 

%0 through %31 all integer registers 

%gO through %g7 global registers — same as %0 through %7 

%oO through %o7 out registers — same as %S through %15 

%/0 through %I7 local registers — same as % 75 through %23 

%iO through %/7 in registers — same as %24 through %31 

Subscripts further identify the placement of the operand in the binary instmction as one 
of: 

regrsi — rs1i\e\6 
reg rs2 — ''s2 Held 
regrd — rdi\e\6 

fregkn freg is a floating-point register. It can have a value from %fO through %f31. Sub- 
scripts further identify the placement of the operand in the binary instruction as one of: 

freg rsi — rsU\e\6 
freg rs2 — t's2 Welti 
fregrd — rdilel6 



A-1 Suggested Assembly Language A-1 



Solboume Computer, Inc. 



creg 

A creg is a coprocessor register. It can have a value from %cO through %c3i: Sub- 
scripts further identify the placement of the operand in the binary instaiction as one of: 

creg rsi — i^s1\\e\6 
cregrd — rd\\e\6 



Special Symbol Names 

Certain special symbols need to be written exactly as they appear in the syntax table. These 
appear in typewriter font, and include a percent sign (%). also in typewriter font. The percent 
sign is part of the symbol name; it must appear as part of the literal value. 

The symbol names are: 

%psr Processor State Register 
%wim Window Invalid Mask register 



%tbr Trap Base Register 

%y Y register 

%f$r Floating-point State Register 

%csr Coprocessor State Register 

%fq Floating-point Queue 

%cq Coprocessor Queue 

%tii Unary operator that extracts high 22 bits of its operand 

%lo Unary operation that extracts low 10 bits of its operand 



Values 

Some instructions use operands comprising values as follows: 

simm13 — A signed immediate constant that fits in 13 bits 

const22 — A constant that fits in 22 bits 

asi— An alternate address space identifier (0 to 255) 

Label 

A sequence of characters, comprised of alphabetic letters (a-z, A-Z [upper and lower case 
distinct]), underscore (J. dollar sign ($), period (.), and decimal digits (0-9), which does not 
begin with a decimal digit. 

Some instnjctions offer a choice of operands. These are grouped as follows: 

regaddr, 

regrsi 

regrsi + regrs2 
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regrsi 






regrsi + 


regrs2 




regrsi + 


simm13 




regrsi - 


Sim m 13 




simm13 






simm13 


+ regrsi 


reg_or_ 


Jmm 

regrs2 
simm13 
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B.1. Introduction 

This appendix describes the SPARC architecture's instruction set. A nx)re detailed, algorithmic 
definition of the instruction set appears in Appendix C. 

Related instructions are grouped into subsections. Each subsection consists of five parts: 

(1) A table of the opcodes defined in the subsection with the values of the field(s) which 
uniquely identify the instnjction(s). 

(2) An illustration of the applicable instruction tormat(s). 

(3) A table of the suggested assembly language syntax. (The syntax notation is described in 
Appendix A.) 

(4) A description of the salient features, restrictions, and trap conditions. 

(6) A list of the synchronous or floating-point/coprocessor traps which can occur as a conse- 
quence of executing the instnjction(s). 

This section does not include any timing information (in either cycles or absolute time) since tim- 
ing is strictly implementation-dependent. 

The following table lists all the instructions: 
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Opcode 


Name 


LDSB (LDSBAt) 
LDSH (LDSHAt) 
LDUB (LDUBAt) 
LDUH (LDUHAt) 
LD (LDAt) 
LDD (LDDA)t 


Load Signed Byte (from Alternate space) 
Load Signed Halfword (from Alternate space) 
Load Unsigned Byte (from Alternate space) 
Load Unsigned Halfword (from Alternate space) 
Load Word (from Alternate space) 
Load Doubleword (from Alternate space) 


LDF 

LDDF 

LDFSR 


Load Floating-point 

Load Double Floating-point 

Load Floating-point State Register 


LDC 

LDDG 

LDCSR 


Load Coprocessor 

Load Double Coprocessor 

Load Coprocessor State Register 


STB (STBAt) 
STH (STHAt) 
ST (STAt) 
STD (STDAt) 


Store Byte (into Alternate space) 
Store Halfword (into Alternate space) 
Store Word (into Alternate space) 
Store Doubleword (into Alternate space) 


STF 
STDF 
STFSR 
STDFQt 


Store Floating-point 
Store Double Floating-point 
Store Floating-point State Register 
Store Double Floating-point Queue 


STC 
STDC 
STCSR 
STDCQt 


Store Coprocessor 
Store Double Coprocessor 
Store Coprocessor State Register 
Store Double Coprocessor Queue 


LDSTUB (LDSTUBAt) 
SWAP (SWAPAt) 


Atomic Load-Store Unsigned Byte (in Alternate space) 
Swap r Register with Memory (in Alternate space) 


ADD (ADDcc) 
ADDX (ADDXcc) 


Add (and modify ice) 

Add with Carry (and modify ice) 


TADDcc (TADDccTV) 


Tagged Add and modify ice (and Trap on overflow) 


SUB (SUBcc) 
SUBX (SUBXcc) 


Subtract (and modify ice) 
Subtract with Carry (and modify ice) 


TSUBcc (TSUBccTV) 


Tagged Subtract and modify ice (and Trap on overflow) 


MULScx: 


Multiply Step and modify ice 


AND (ANDcc) 
ANDN (ANDNcc) 
OR (ORcc) 
ORN (ORNcc) 
XOR (XORcc) 
XNOR (XNORcc) 


And (and modify ice) 
And Not (and modify ice) 
Inclusive-Or (and modify ice) 
Inclusive-Or Not (and nx)dify ice) 
Exclusive-Or (and modify ice) 
Exclusive-Nor (and modify ice) 


SLL 
SRL 
SRA 


Shift Left Logical 
Shift Right Logical 
Shift Right Arithmetic 


SETHI 


Set High 22 bits of r register 


SAVE 
RESTORE 


Save caller's window 
Restore caller's window 
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Opcode 


Name 


Bice 

FBfcc 

CBccc 


Branch on integer condition codes 
Branch on floating-point condition codes 
Branch on coprocessor condition codes 


CALL 
JMPL 


Call 

Jump and Link 


RETTt 


Return from Trap 


Ticc 


Trap on integer condition codes 


RDY 
RDPSRt 
RDWIMt 
RDTBRt 


Read Y register 
Read Processor State Register 
Read Window Invalid Mask register 
Read Trap Base Register 


WRY 
WRPSRt 
WRWIMt 
WRTBRt 


Write Y register 
Write Processor State Register 
Write Window Invalid Mask register 
Write Trap Base Register 


UNIMP 


Unimplemented instruction 


IFLUSH 


Instruction cache Flush 


FPop 


Floating-point Operate: FiTO{s,d,x), F(s.d,x)TOi 
FsTOd, FsTOx, FdTOs, FdTOx. FxTOs. FxTOd. FMOVs. FNEGs. FABSs. 
FSQRT(s.d.x). FADD(s,d,x), FSUB(s,d,x). FMUL(s.d,x), FDIV(s.d,x), 
FCMP(s.d,x). FCMPE(s.d,x) 


CPop 


Coprocessor operate 



t privileged instruction 
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B.2. Load Integer Instructions 



opcode 


op3 


operation 


LDSB 


001001 


Load Signed Byte 


LDSBAt 


011001 


Load Signed Byte from Alternate space 


LDSH 


001010 


Load Signed Hattword 


LDSHAt 


011010 


Load Signed Halfword from Alternate space 


LDUB 


000001 


Load Unsigned Byte 


LDUBAt 


010001 


Load Unsigned Byte from Alternate space 


LDUH 


000010 


Load Unsigned Halfword 


LDUHAt 


010010 


Load Unsigned Halfword from Alternate space 


LD 


000000 


Load Word 


LDAt 


010000 


Load Word from Alternate space 


LDD 


000011 


Load Doubieword 


LDDAt 


010011 


Load Doubieword from Alternate space 



t privileged instruction 
Format (3): 



V 


rd 




op3 


rsl 


i=0 




asi 


rs2 


31 


29 


24 




18 


13 


12 




4 6 


11 


rd 


op3 


rsl 


i=1 


simmlS | 


31 


29 


24 




18 


13 


12 








Suggested Assembly Language Syntax 


idsb 


address], regrd 


Idsba 


regaddr] asi, regrd 


Idsh 


address], regrd 


Idsha 


regaddr] asi, regrd 


Idub 


address], regrd 


Iduba 


regaddr] asi, regrd 


Iduh 


address], regrd 


Iduha 


regaddr] asi, regrd 


Id 


address], regrd 


Ida 


regaddr] asi, regrd 


Idd 


address], regrd 


Idda 


regaddr] asi, regrd 



Description: 

The load single integer instructions move either a byte, halfword, or word from memory into 
the r register defined by the/d field. A fetched byte or halfword is right-justified in rd and 
may be either zero-filled or sign-extended. 

The load double integer instructions (LDD, LDDA) move a doubieword from memory into an r 
register pair The most significant word at the effective memory address is moved into the 
even r register The least significant word at the effective memory address + 4 is moved into 
the odd r register. The least significant bit of the rd field is ignored. (Note that a load double 
with rd ~ modifies only r{1].) 

The effective address for a load instruction is either "r[rs1] + r[rs2]" If the /field is zero, or 
"rtrs1] + sign_ext(simm13)" If the /field is one. Instructions which load from an alternate 
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address space must have zero in the /field and the address space identifier to be used for 
the load in the as/ field. Otherwise the address space indicates either a user or system data 
space access, according to the S bit of the PSR. 

LD and LDA cause a mem_address_not_aligned trap if the effective address is not word- 
aligned; LDUH, LDSH, LDUHA, and LDSHA trap if the address is not halfword-aligned; and 
LDD and LDDA trap if the address is not doubleword-aligned. 

If a load single instruction traps, the destination register remains unchanged. 

If a load double instruction is trapped with a data access exception during the effective 
address menx)ry access, the destination registers remain unchanged. However a specific 
implementation might cause a data_access_exception trap during the effective address + 4 
menx)ry access, but not during the effective address access. Thus, the even destination r 
register can be changed in this case. (Note that this cannot happen across a page boundary 
because of the doubleword-alignment restriction.) 



B.2.1. Implementation Note: 

On effective address + 4 accesses, the system should limit data_access_exceptions to non- 
restartable errors, such as uncorrectable memory errors. 



B.2.2. Programming Note 

The execution time of a load integer instruction may increase if the next instruction uses the 
register specified by the rd field of the load instmction as a source operand (rs1 or rs2j. In 
the case of load doubleword instructions, this applies to t>oth destination registers. Whether 
the time increase occurs or not is implementation-dependent. 

B.2.3. Programming Note 

When /= 1 and rs1 = 0, any location in the lowest or highest 4K bytes of an address space 
can be accessed without using a register. 

Traps: 

illegaljnstnjction (load alternate space with i = 1) 
privilegedjnstnjction (load alternate space only) 
mem_address_not_aligned (excluding LDSB, LDSBA, LDUB, and LDUBA) 
data_access_exception 
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B.3. Load Floating-point instructions 



Format (3): 



opcode op3 



LDP 

LDDF 

LDFSR 



operation 



100000 
100011 
100001 



Load Floating-point register 

Load Double Floating-point register 

Load Floating-point State Register 



11 


rd 




op3 


rsl 


i=0 




ignored 


rs2 


31 


29 


24 




18 


13 


12 




4 


11 

31 ■■"■ 


rd 

2d 


24 


op3 


rs1 

Id 


i=1 

13 


12 


simm13 






Suggested Assembly Language Syntax 



Id 

Idd 

Id 



[addressl fregrd 
[addressl fregrd 
[address], %fsr 



Description: 



The load single floating-point instruction (LDF) moves a word from memory into the f register 
identified by the /cf field. 

The load double floating-point instruction (LDDF) moves a doubleword from memory into an 
f register pair. The most significant word at the effective memory address is moved into the 
even f register The least significant word at the effective memory address + 4 is moved into 
the odd f register The least significant bit of the rcf field is ignored. 

The load floating-point state register instmction (LDFSR) waits for all FPops that have not 
finished execution to complete and then loads a word from memory into the FSR. 

The effective address for the load instruction is either "r[rs1] + r[rs2]" if the /field is zero, or 
"r[rs1] + sign_ext(simm13)" if the /field is one. 

LDF and LDFSR cause a mem_address_not_aligned trap if the effective address is not 
word-aligned; and LDDF traps if the address is not doubleword-aligned. A load floating-point 
instoiction causes an fp_disabled trap if the EF field of the PSR is or if no FPU is present. 

If a load single floating-point instajction is trapped with a data access exception, the destina- 
tion f register either remains unchanged or is set to an implementation-defined constant 
value. 

If a load double floating-point instruction is trapped with a data access exception, either the 
destination f registers remain unchanged or one or both are set to an implementation- 
defined constant value. 



B.3.1. Programming Note 

The execution time of a load floating-point instmction may increase if the next instmction 
uses the register specified by the rd field of the toad instmction as a source operand (rsl or 
rs2). In the case of load double floating-point instoictions, this applies to t)Oth destination 
registers. Whether the time increases or not is implementation-dependent. 
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B.3.2. Programming Note 



When /= 1 and rs1 = 0, any location in the lowest or highest 4K bytes of an address space 
can be accessed without using a register. 



Traps: 



fp_disabled 
fp_exception 

mem_address_not_aligned 
data_access_exception 
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B.4. Load Coprocessor Instructions 



Format (3): 



opcode op3 



LDC 

LDDC 

LDCSR 



operation 



110000 
110011 
110001 



Load Coprocessor register 

Load Double Coprocessor register 

Load Coprocessor State Register 



11 


rd 




ops 


rs1 


i=0 


ignored 


rs2 1 


31 


29 


24 




18 


13 


12 




4 


11 

31 


rd 

^TS 


24 


op3 


rsl 

18 


13 


12 


simm13 






Suggested Assembly Language Syntax 



Id 

Idd 

id 



[address], cregrd 
[address], cregrd 
[address], %csr 



Description: 

The load single coprocessor instruction (LDC) moves a word from memory into a coproces- 
sor register. The load double coprocessor instnjction (LDDC) nr)oves a doubleword from 
memory into a coprocessor register pair. The load coprocessor state register instruction 
(LDCSR) moves a word from memory into the Coprocessor State Register. The semantics 
of these instnjctions depend on the implementation of the attached coprocessor. 

The effective address for the load instruction is either "r[rs1] + r[rs2]" if the /field is zero, or 
"rlrsi] + sign_ext(simm13)" if the /field is one. 

LDC and LDCSR cause a mem_address_not_aligned trap if the effective address is not 
word-aligned; and LDDC traps if the address is not doubleword-aligned. A load coprocessor 
instruction causes a cp_disabled trap if the EC field of the PSR is or if no coprocessor is 
present. 

If a load coprocessor instruction traps, the state of the coprocessor depends on its imple- 
mentation. 



B.4.1. Implementation Note: 

On effective address + 4 accesses, the system should limit data_access_exceptions to non- 
restartable errors, such as uncorrectable menriory errors. 



B.4.2. Programming Note 

The execution time of a load coprocessor instruction may increase if the next instruction 
uses the register specified by the rtf field of the load instnjction as a source operand (rs1 or 
rs2). In the case of load double coprocessor instructions/this applies to both destination 
registers. Whether the time increases or not is implementation-dependent. 
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B.4.3. Programming Note 



When / = 1 and rs1 = 0, any location in the lowest or highest 4K bytes of an address space 
can be accessed without using a register. 



Traps: 



cp_disabled 
cp_exception 

mem_address_not_aligned 
data_access_exception 
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B.5. Store Integer Instructions 



opcode 


op3 


operation 


STB 


000101 


Store Byte 


STBAt 


010101 


Store Byte into Alternate space 


STH 


000110 


Store Halfword 


STHAt 


010110 


Store Halfword into Alternate space 


ST 


000100 


Store Word 


STAt 


010100 


Store Word Into Alternate space 


STD 


000111 


Store Doubleword 


STDAt 


010111 


Store Doubleword into Alternate space 



t privileged instruction 
Format (3): 



11 


rd 


op3 


rs1 


1=0 


asi 


rs2 


3i 


29 


24 




18 


13 


12 




4 


11 


rd 


op3 


rs1 


i=1 


simm13 


31 


29 


24 




18 


13 


i2 








Suggested Assembly Language Syntax 


stb 


regrd, [address] 


syr)onyms:stub, stsb 


stba 


regtd, regaddr] asi 


synonyms :stuba, stsba 


sth 


reQrd, address] 


synonyms:stuh, stsba 


stha 


reQrd, regaddr] asi 


synonyms:stulia, stsha 


St 


regrd, address] 




sta 


regrd, regaddr] asi 




std 


regrd, address] 




stda 


^^9rd, [regaddr] asi 





Description: 

The store single integer instructions move the word, the least significant halfword, or the 
least significant byte from the r register specified by the rd field into memory. 

The store double integer instructions (STD, STA) move a doubleword from an r register pair 
into memory. The most significant word in the even r register is written into memory at the 
effective address and the least significant word in the following odd r register is written into 
menrKDry at the effective address + 4. 

The effective address for a store instruction is either "rlrs1] + r[rs2]" if the /field is zero, or 
"rlrs1] + sign_ext(simm13)*' if the / field is one. Instmctions which store to an alternate 
address space must have zero in the /field and the address space identifier to be used for 
the store in the as/ field. Otherwise the address space indicates either a user or system data 
space access, according to the S bit in the PSR. 

ST and STA cause a mem_address_not_aIigned trap if the effective address is not word- 
aligned; STH and STHA trap if the address is not halfword-aligned; and STD and STDA trap 
If the address is not doubleword-aligned. 

If a store single instruction traps, memory remains unchanged. However, in the case of a 
store double, an implementation might cause a data_access_exception trap during the 
effective address + 4 memory access, but not during the effective address access. Thus, 
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data at the effective memory address can be changed in this case. (Note that this cannot 
happen across a page boundary because of the doubleword-alignment restriction.) 



B.5.1. Implementation Note: 

On effective address + 4 accesses, the system should limit data_access_exceptions to non- 
restartable errors, such as uncorrectable memory errors. 

B.S.2. Programming Note 

When /= 1 and r$1 = 0, any location in the lowest or highest 4K bytes of memory can be 
written without using a register. 



Traps: 



illegaljnstruction (store alternate with i = 1) 
privilegedjnstruction (store alternate only) 
mem_address_not_aligned (excluding STB and STBA) 
data_access_exception 
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B.6. Store Floating-point Instructions 



opcode op3 



STF 
STDF 
STFSR 
STDFQt 



t privileged instruction 
Format (3): 



operation 



100100 
100111 
100101 
100110 



Store Floating-point 
Store Double Floating-point 
Store Floating-point State Register 
Store Double Floating-point Queue 



11 


rd 




op3 


rsl 


i=0 




ignored 


rs2 


3l 


29 


24 




18 


13 


12 




4 


11 


rd 


1 


op3 


rsl 


i=1 


simm13 


3i 


2d 


24 




iS 


13 


12 








Suggested Assembly Language Syntax 



St 

std 

St 

std 



fregrd, [address] 
f^^9rd, [address] 
%fsr, [address] 
%fq, [address] 



Description: 

The store single floating-point instruction (STF) moves the contents of the freg/sfer specified 
by the Atf field into memory. 

The store double floating-point instruction (STDF) moves a doubleword from an f register 
pair into memory. The most significant word in the even f register is written into memory at 
the effective address and the least significant word in the odd f register is written into 
menrK)ry at the effective address + 4. 

The store floating-point queue instruction (STDFQ) stores the front entry of the Floating-point 
Queue (FQ) into memory. The address part of the front entry is stored into memory at the 
effective address, and the instruction part of the front entry at the effective address + 4. If 
the FPU is in exception_mode. the queue is then advanced to the next entry, or it becomes 
empty (as indicated by the qne bit in the FSR). 

The store floating-point state register instruction (STFSR) waits for all FPops that have not 
finished execution to complete and then writes the FSR into memory. 

The effective address for a store instruction is either "rfrsl] + r[rs2]" if the /field is zero, or 
"r[rs1] + sign_ext(simml3)" if the /field is one. 

STF and STFSR cause a mem_address_not_aligned trap if the address is not word-aligned 
and STDF and STDFQ trap if the address is not doubleword-aligned. A store floating-point 
instruction causes an fp_disabled trap if the EF field of the PSR is or if the FPU is not 
present. 

If a store single floating-point instruction traps, memory remains unchanged. However, in 
the case of a store double, an implementation may cause a data_access_exception trap dur- 
ing the effective address + 4 memory access, but not during the effective address access. 
Data at the effective memory address can be changed in this case. (Note that this cannot 
happen across a page boundary because of the doubleword-alignment restriction.) 
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B.6.1. Implementation Note: 



On effective address + 4 accesses, the system should limit data_access_exceptions to non- 
restartable errors, such as uncorrectable memory errors. 



Traps: 



fp_disabled 

fp_exception 

privilegedjnstnjction (STDFQ only) 

mem_address__not_aligned 

data_access_exception 
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B.7. Store Coprocessor Instructions 



opcode op3 



STC 
STDC 
STCSR 
STDCQt 



t privileged instruction 
Format (3): 



operation 



110100 
110111 
110101 
110110 



Store Coprocessor 
Store Double Coprocessor 
Store Coprocessor State Register 
Store Double Coprocessor Queue 



11 


rd 




op3 


rsl 


i=0 




ignored 


rs2 


31 


29 


24 




18 


13 


12 




4 


11 


rd 




op3 


rsl 


1=1 


simm13 


31 


2d 


24 




18 


i3 


12 








Suggested Assembly Language Syntax 



St 

std 

St 

std 



creQrd, [address] 
creQrd, [address] 
%csr, [address] 
%cq, [address] 



Description: 

The store single coprocessor instruction (STC) moves the contents of a coprocessor register 
into memory. The store double coprocessor instnjction (STDC) moves the contents of a 
coprocessor register pair into memory. The store coprocessor state register instruction 
(STCSR) nrK)ves the contents of the coprocessor state register into memory. The store dou- 
ble coprocessor queue instruction (STDCQ) moves the front entry of the coprocessor queue 
into menx)ry. The semantics of these instructions depend on the implementation of the 
attached coprocessor, if any. 

The effective address for a store instruction is either "r[rs1] + r[rs2]" if the /field is zero, or 
"r[rs1] + sign_ext(simm13)" if the /field is one. 

STC and STCSR cause a mem_address__not_aligned trap if the address is not word-aligned 
and STDC and STDCQ trap if the address is not doubleword-aligned. A store coprocessor 
instruction causes a cp_disabled trap if the EC field of the PSR is or if no coprocessor is 
present. 

If a store single coprocessor instruction traps, memory remains unchanged. However, in the 
case of a store double, an implementation might cause a data_access_exception trap during 
the effective address + 4 memory access, but not during the effective address access. Thus, 
data at the effective memory address can be changed in this case. (Note that this cannot 
happen across a page boundary because of the doubleword-alignment restriction.) 



B.7.1. Implementation Note: 

On effective address + 4 accesses, the system should limit data_access_exceptions to non- 
restartable errors, such as uncorrectable memory errors. 
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Traps: 



cp_disabled 

cp_exception 

privilegedjnstruction (STDCQ only) 

mem_address_noLaligned 

data_access_exception 
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B.8. Atomic Load-Store Unsigned Byte Instructions 



opcode 



op3 



operation 



LDSTUB 
LDSTUBAt 



001101 
011101 



Atomic Load-Store Unsigned Byte 

Atomic Load-Store Unsigned Byte into Alternate space 



t privileged instmction 
Format (3): 



V 


rd 




op3 


rsl 


i=0 




asi 


rs2 


31 


29 


24 




18 


13 


12 




4 


11 


rd 


Ops 


rsl 


i=1 


simm13 


31 


29 


24 




18 


13 


12 








Suggested Assembly Language Syntax 



idstub [address], regrd 

Idstuba [regaddr] asi, regrd 



Description: 

The atomic load-store instructions move a byte from memory into an r register identified by 
the rtf field and then rewrite the same byte in memory to all ones without allowing intervening 
asynchronous traps. In a multiprocessor system, two or more processors executing atomic 
load-store instructions addressing the same byte simultaneously are guaranteed to execute 
them in some serial order. 

The effective address of an atomic load-store is either "r{rs1] + rlrs2]" if the /field is zero, or 
"r[rs1] + sign_ext(simm13)" if the /field is one. LDSTUBA must have zero in the /field, or an 
illegaljnstruction trap occurs. The address space identifier used for the memory accesses is 
taken from the as/ field. For LDSTUB, the address space indicates either a user or system 
data space access, according to the S bit in the PSR. 

If an atomic load-store instmction traps, memory remains unchanged. However, an imple- 
mentation may cause a data_access_exception trap during the store merwry access, but 
not during the load access. In this case, the destination register can be changed. 

B.8.1. Implementation Note: 

The system should limit data_access_exceptions on the store access to non-restartable 
errors, such as protection violation or uncorrectable memory errors. 



B.8.2. Programming Note 

When /= 1 and rs1 = 0, any location in the lowest or highest 4K bytes of memory can be 
accessed without using a register. 

Traps: 

Illegaljnstruction (LDSTUBA with i = 1 only) 
privilegedjnstaiction (LDSTUBA only) 
data_access_exception 
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B.9. SWAP r Register with Memory 





opcode 


op3 


operation 






SWAP 
SWAPAt 


001111 
011111 


SWAP r register v^'Ah memory 

SWAP rrep/s/er with Alternate space memory 




t privileged i 
Format (3): 


nst ruction 








11 rcJ 


op3 


rsl i=0 asi rs2 


31 ^ 24 




18 13 12 4 


11 rd 

31 29 24 


op3 


rsl i=1 simm13 
i6 13 12 6 



Suggested Assembly Language Syntax 



swap [source], regrd 
swapa [regsource] asi, regrd 



Description: 

The swap instructions exchange the r register identified by the rd field with the contents of 
the addressed memory location. This is performed atomically without allowing asynchronous 
traps. In a multiprocessor system, two or more processors issuing swap instructions simul- 
taneously are guaranteed to get results corresponding to the executing the instructions seri- 
ally, in some order. 

The effective address of the swap instruction is either "rlrsi] + rlrs2]" if the /field is zero, or 
"r[rs1] + sign_ext(simml3)" if the /field is one. SWAPA must have zero in the /field or an 
illegaljnstruction trap occurs. The address space identifier used for the menDory accesses is 
taken from the as/ field. For SWAP, the address space indicates either a user or a system 
data space access, according to the S bit in the PSR. 

These instructions cause a mem_address_not_aligned trap if the effective address is not 
word-aligned. 

If a swap instnjction traps, memory remains unchanged. 



B.9.1. Programming Note 

When /= 1 and rs1 = 0, any location in the lowest or highest 4K bytes of memory can be 
written without using a register. 

Traps: 

illegal instruction (i = 1 and SWAPA only) 
privilegedjnstmction (SWAPA only) 
data_access_exception 
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Fonnat (3): 



opcode op3 



ADD 
ADDcc 
ADDX 
ADDXcc 



operation 



000000 
010000 
001000 
011000 



Add 

Add and modify ice 

Add with Carry 

Add with Carry and modify ice 



10 


rd 




op3 


rsl 


i=0 




ignored 


1 rs2 


31 


29 


21 




18 


13 


12 




4 


10 

31 


rd 


U 


op3 


rsl 

IB 


i=1 

13 


12 


simm13 






Suggested Assembly Language Syntax 



add regrsu reg^orjmm, regrd 

addcc regrsu reg_orJmm, regrd 

addx regrsu reg^orjmm, regrd 

addxcc regrsu reg_orJmm, regrd 



Description: 

ADD and ADDcc compute either"r[rs1] + r[rs2]" if the / field is zero, or "r[rs1] + 
sign_ext(simm13)" if the /field is one, and place the result in the r reg/ster specif ied in the rd 
field. 

ADDX and ADDXcc add the PSR's carry (c) bit also; that is. they compute "rtrsi] + r[rs2] + 
cf* or "r[rs1] + sign_ext(simml3) + c" and place the result in the r rep/sfer specified in the rd 
field. 

ADDcc and ADDXcc modify all the integer condition codes. 

Traps: 

(none) 
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B.11. Tagged Add Instructions 



Format (3): 



opcode 



op3 



operation 



TADDcc 
TADDccTV 



100000 
100010 



Tagged Add and modify ice 

Tagged Add, modify ice and Trap on Overflow 



10 


rd 




0p3 


rsl 


1=0 




ignored 


rs2 1 


31 


29 


24 




IB 


13 


12 




4 


10 


rd 


op3 


rs1 


1=1 


simm13 | 


3i 


29 


24 




IS 


13 


12 








Suggested Assembly Language Syntax 



taddcc reQrsu reg_orJmm, regrd 

taddcctv regrsh reg_orJmm, regrd 



Description: 

These instructions compute either"r[rs1] + r[rs2]" if the / field is zero, or "r[rs1] + 
sign_ext(simm13)" if the /field is one. An overflow condition exists if bit 1 or bit of either 
operand is not zero, or if the addition generates an arithmetic overflow. 

If a TADDccTV causes an overflow condition, a tag_overflow trap is generated and the desti- 
nation register and condition codes remain unchanged. If a TADDccTV does not cause an 
overflow condition, all the integer condition codes are updated (in particular, the overflow bit 
(v) is set to 0) and the result of the addition is written into the r register specHleti by the rd 
field. 

If a TADDcc causes an overflow condition, the overflow bit (v^ of the PSR is set; if it does not 
cause an overflow, It is cleared. In either case, the remaining integer condition codes are 
also updated and the result of the addition is written into the r register specified by the rd 
field. 

Traps: 

tag_overflow (TADDccTV only) 
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Format (3): 



opcode op3 



SUB 
SUBcc 
SUBX 
SUBXcc 



operation 



000100 
010100 
001100 
011100 



Subtract 

Subtract and modify ice 

Subtract with Carry 

Subtract with Carry and modify ice 



10 


rd 




op3 


rsl 


i=0 




ignored 


rs2 


31 


29 


24 




18 


13 


12 




4 


10 


rd 




op3 


rsl 


i=1 


simm13 | 


3i 


29 


24 




18 


13 


12 








Suggested Assembly Language Syntax 



sub regrsu reg^orjmm, regrd 

subcc regrsu reg_orJmm, regrd 

subx regrsu reg^orjmm, regrd 

subxcc regrsu reg_orJmm, regrd 



Description: 

These instructions compute either 'Yfrsl] - r[rs2]" if the / field is zero, or "r[rs1] - 
sign_ext(simm13)" if the /field is one, and place the result in the r reg/s/er specif led in the rd 
field. 

SUBX and SUBXcc ("SUBtract extended") also subtract the PSR's carry (c) bit; that is, they 
compute "r[rs1] - r[rs2] - c" or "r[rsl] - sign_ext(simm13) - d' and place the result in the r 
register specified in the at/ field. 

SUBcc and SUBXcc nrK)dify all the integer condition codes. 



B.12.1. Programming Note 

A SUBcc with rd= can be used for signed and unsigned integer compare. 
Traps: 
(none) 
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B.I 3. Tagged Subtract Instructions 





opcode 


0p3 


operation 






TSUBcc 
TSUBccTV 


100001 
100011 


Tagged Subtract and modify ice 

Tagged Subtract, modify ice and Trap on Overflow 




>rmat (3 


): 








10 rd 


op3 


rs1 i=0 ignored \ rs2 


91 25 24 




18 13 _12 4 


10 1 rd 1 

31 25 '24 


op3 


rsl i=1 simm13 

i6 13 12 6 



Suggested Assembly Language Syntax 



tsubcc regrsu reg_orJmm, regrd 
tsubcctv regrsh reg^orjmm, regrd 



Description: 

These instnjctions compute either"r[rs1] - rtrs2]" If the / field is zero, or "r[rs1] - 
sign__ext(simml3)" If the /field is one. An overflow condition exists If bit 1 or bit of either 
operand is not zero, or if the subtraction generates an arithmetic overflow. 

If a TSUBccTV causes an overflow condition, a tag_overflow trap is generated and the desti- 
nation register and condition codes remain unchanged. If a TSUBccTV does not cause an 
overflow condition, the integer condition codes are updated (in particular, the overflow bit (v) 
is set to 0) and the result of the subtraction is written into the r register specified by the rd 
field. 

If a TSUBcc causes an overflow condition, the overflow bit (v) of the PSR is set; if it does not 
cause an overflow, it is cleared. In either case, the remaining integer condition codes are 
also updated and the result of the subtraction is written into the rrep/s/er specified by the rd 
field. 

Traps: 

tag_overflow (TSUBccTV only) 
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opcode op3 



operation 



MULScc 



100100 



Multiply Step and modify ice 



Fon7iat(3): 



l' 


rd 




op3 


rsl 


1=0 




ignored 


rs2 


31 


29 


24 




18 


13 


12 




4 6 


10 


rd 




op3 


rsl 


i=1 


simm13 


31 


29 


24 




IB 


13 


12 








Suggested Assembly Language Syntax 



mulscc reQrsh reg_orJmm, regrd 



Description: 

The multiply step instruction can be used to generate the 64-bit product of two signed or 
unsigned words (See Appendix E). MULScc works as follows: 

1. The value obtained by shifting **r[rs1]" (the incoming partial product) right by one bit and 
replacing its high-order bit by "N xor V" (the sign of the previous partial product) is com- 
puted. 

2. If the least significant bit of the Y register (the multiplier) is set, the value from step (1) is 
added to the multiplicand. The multiplicand is "r{rs2]" if the /field is zero or is 
"sign_ext(simm13)" if the /field is one. If the LSB of the Y register is not set, then zero 
is added to the value from step (1 ). 

3. The result from step (2) is written into "r[rd]" (the outgoing partial product). The PSR's 
integer condition codes are updated according to the addition performed in step (2). 

4. The Y register (the multiplier) is shifted right by one bit and its high-order bit is replaced 
by the least significant bit of "r[rs1]" (the incoming partial product). 

Traps: 

(none) 
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B.15. Logical Instaictions 



opcode 


op3 


operation 


AND 


000001 


And 


AN Dec 


010001 


And and modify ice 


ANDN 


000101 


And Not 


ANDNcc 


010101 


And Not and modify ice 


OR 


000010 


Inclusive Or 


ORcc 


010010 


Inclusive Or and modify ice 


ORN 


000110 


Inclusive Or Not 


ORNcc 


010110 


Inclusive Or Not and modify ice 


XOR 


000011 


Exclusive Or 


XORcc 


010011 


Exclusive Or and modify ice 


XNOR 


000111 


Exclusive Nor 


XNORCC 


010111 


Exclusive Nor and modify ice 



Format (3): 



10 


rd 




0p3 


rs1 


i=0 




ignored 


rs2 


9i 


29 


24 




16 


13 


12 




4 


10 

31 


rd 

29 


24 


0p3 


rs1 

16 


i=1 

13 


12 


simm13 


— J 



Suggested Assembly Language Syntax 


and 


regrsi 


reg^orjmm, 


regrd 


andcc 


regrsi 


reg_orJmm, 


regrd 


andn 


regrsi 


reg_orjmm, 


regrd 


andncc 


regrsi 


reg^orjmm, 


regrd 


or 


regrsi 


reg^orjmm, 


regrd 


orcc 


regrsi 


reg_orJmm, 


regrd 


orn 


regrsi 


reg_orJmm, 


regrd 


orncx; 


regrsi 


reg_orJmm, 


regrd 


xor 


regrsi 


reg_or_imm, 


regrd 


xorcc 


regrsi 


reg_or_tmm, 


regrd 


xnor 


regrsi 


reg_orjmm, 


regrd 


xnorcc 


regrsi 


reg_orJmm, 


regrd 



Description: 

These instmctions implement the bitwise logical operations. They compute either "r[rs1] op 
r[rs2]" if the /field is zero, or "r[rs1] op sign_ext(simm13)" if the /field is one (op = and, and 
not, or, or not, xor, xnor). 

ANDcc, ANDNcc, ORcc, ORNcc, XORcc and XNORcc modify all the integer condition codes 
as described in the section Registers. 

Traps: (none) 
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Format (3): 



opcode op3 



SLL 
SRL 
SRA 



operation 



100101 
100110 
100111 



Shift Left Logical 
Shift Right Logical 
Shift Right Arithmetic 



10 


rd 


op3 


rsl 


i=0 


ignored 


rs2 


31 


29 


24 




18 


13 


12 


4 


lj° 


rd 




op3 


rsl 


i=1 


ignored 


shcnt 


3i 


2d 


24 




18 


13 


12 


4 



Suggested Assembly Language Syntax 



sll regrsu reg^orjmm, regrd 
sri regrsu reg^orjmm, regrd 
sra regrsu reg^orjmm, regrd 



Description: 

The shift count for these instructions is the least significant five bits of either "r[rs2]" if the / 
field is zero, or "simm13" if the /field is one. (The least significant five bits of "simm13" is 
called "shcnt" in the above format.) 

SLL shifts the value of "r[rs1]" left by the number of bits implied by the shift count. 

SRL and SRA shift the value of "r[rs1]" right by the number of bits implied by the shift count. 

SLL and SRL replace vacated positions with zeroes, whereas SRA fills vacated positions 
with the most significant bit of "r[rs1 ]." No shift occurs when the shift count is zero. 

All of these instructions place the shifted result in the rreg/s/er specified in the rd field. 

These instructions do not modify the condition codes. 



B.16.1. Programming Note 

"Arithmetic left shift by 1 (and calculate overflow)" can be implemented with an ADDcc 
instnjction. 



Traps: 
(none) 
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B.17. SETHI Instruction 







opcode op op2 operation 








SETHI 


00 


100 


Set High 




Format (2): 










1 00 rd 1 


100 imm22 


1 


31 » 


24 21 







Suggested Assembly Language Syntax 






sethi const22, regni 
sethi %hi(value), reQrd 




Description: 

















SETHI zeroes the least significant 10 bits of ''r[rd]" and replaces its high-order 22 bits with 
imm22. 

The condition codes are not affected. 



B.17.1. Programming Note 

It is suggested that sethi 0, %0 be used as the preferred NOP. since it will not cause an 
increase in execution time if it follows a load instruction. 



Traps: 
(none) 
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B.18. SAVE and RESTORE Instructions 



Format (3): 



opcode 



SAVE 
RESTORE 



op3 



operation 



111100 
111101 



Save caller's window 
Restore caller's window 



I' 


rd 


op3 


rsl 


i=0 




ignored 


1 rs2 1 


3i 


29 


24 




18 


13 


12 




4 


10 

3i 


rd 


24 


op3 


rsl 

id 


i=1 

i3 


12 


simm13 


6 



Suggested Assembly Language Syntax 



save regrsu reg_or_imm, regrd 

restore regrsu reg_orJmm, regrd 



Description: 

The SAVE instruction subtracts one from the CWP (modulo the number of implemented win- 
dows) and compares this value, the *'new_GWP;' against the Window Invalid Mask (WIM) 
register. If the WIM bit corresponding to the new^CWP is set. "(WIM and 2^^*^-^^^) = 1." 
then a window_overflow trap is generated. If the WIM bit con-esponding to the new_CWP is 
reset, then a window_overflow trap is not generated and new_GWP is written into CWP. 
This causes the active window to become the previous window, thereby saving the caller's 
window. 

The RESTORE instruction adds one to the CWP (modulo the number of implemented win- 
dows) and compares this value, the "new_CWP," against the Window Invalid Mask (WIM) 
register. If the WIM bit corresponding to the new.CWP is set, "(WIM and 2^^^-^^^) = 1," 
then a window_underflow trap is generated. If the WIM bit con-esponding to the new_CWP 
is reset, then a window_underflow trap is not generated and new_CWP is written into CWP. 
This causes the previous window to become the active window, thereby restoring the 
caller's window. 

Furthermore, if an overflow or underflow trap is not generated, SAVE and RESTORE 
behave like normal ADD instructions, except that the operands "r[rs1]" or "rtrs2]" are read 
from the old window (i.e., the window addressed by the original CWP) and the result is writ- 
ten into "r[rd]" of the new window (i.e., the window addressed by new_CWP). 

Note that CWP arithmetic is performed modulo the number of implemented windows (NWIN- 
DOWS). 



Traps: 



window_overflow (SAVE only) 
window_underflow (RESTORE only) 
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B.19. Branch on Integer Condition Instructions 



opcode 


cond 


operation 


ice test 


BA 


1000 


Branch Always 


1 


BN 


0000 


Branch Never 





BNE 


1001 


Branch on Not Equal 


notZ 


BE 


0001 


Branch on Equal 


2 


BG 


1010 


Branch on Greater 


not (Z or (N xor V)) 


BLE 


0010 


Branch on Less or Equal 


Zor(NxorV) 


BGE 


1011 


Branch on Greater or Equal 


not (N xor V) 


BL 


0011 


Branch on Less 


NxorV 


BGU 


1100 


Branch on Greater Unsigned 


not(CorZ) 


BLEU 


0100 


Branch on Less or Equal Unsigned 


(CorZ) 


BCC 


1101 


Branch on Carry Clear (Greater than or Equal, Unsigned ) 


note 


BCS 


0101 


Branch on Carry Set (Less than, Unsigned) 


C 


BROS 


1110 


Branch on Positive 


notN 


BNEG 


0110 


Branch on Negative 


N 


BVC 


1111 


Branch on Overflow Clear 


notV 


BVS 


0111 


Branch on Overflow Set 


V 



Format (2): 



00 a cond 010 disp22 

51 — 29 25 ^ ^21 D 



Suggested Assembly Language Syntax 


ba{,a; 


label 




bn{,a} 


label 




bne{,a} 


label 


synonym: bnz 


be{,a} 


label 


synonym: bz 


bg{,a} 


label 




ble{,a} 


label 




bge(,a) 


label 




bl{,a} 


label 




bgu{,a} 


label 




bleu{,a} 


label 




bcc{.a} 


label 


synonym: bgeu 


bcs{,a} 


label 


synonym: blu 


bpos{,a} 


label 




bneg{.a} 


label 




bvc{,a} 


label 




bvs{,a} 


label 





^ i^t ii NOTE ^ ii ^ 
To set the "annul" bit for Bice instructions, append an (optional) 
".a" to the opcode. For example, use "bgu.a later. The 
preceding table indicates that the ".a" is optional by enclosing it 
in braces ({}). 
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Description: 



A Bice instruction (except BA and BN) evaluates the integer condition codes (ice) 
according to the cond field. If the condition codes evaluate to true the branch is taken 
and the instruction causes a PC-relative, delayed control transfer to the address "PC + 
(4 * sign_ext (disp22))." If the condition codes evaluate to false, the branch Is not taken. 
If the branch is not taken and the a (annul) field is set, the delay instmction is not exe- 
cuted (annulled). If the branch is taken, the annul field is ignored. (Annulment, delay 
instructions, and delayed control transfers are described further in the section Instruc- 
tions) 

BN (Branch Never) acts like a "NOP." except that, if the annul field is one, the delay 
instmction is not executed (annulled). If the annul field is zero, the delay instruction is 
executed. 

BA (Branch Always) causes a transfer of control, in-espective of the value of the condi- 
tion code bits. If the annul field is one, the delay instruction is not executed (annulled). 
If the annul field is zero, the delay instruction is executed. 

i^ Hi i^ NOTE i^i ^ ik 

Except for BA, all Bice instructions with a=1 annul the delay 
Instruction when the branch is not taken. However, BA with a-1 
does the reverse: it annuls the delay instmction even though the 
branch is taken. 

The delay instruction of a Bice, other than a BA, should not be a 
delayed control-transfer instruction. 



B.19.1. Programming Note 

An untaken branch takes as much or more time than a taken branch. The additional time it 
takes is implementation-dependent. 

Traps: 

(none) 
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B.20. Floating-point Branch on Condition Instructions 



opcode 


cond 


operation 


fee test 


FBA 


1000 


Branch Always 


1 


FBN 


0000 


Branch Never 





FBU 


0111 


Branch on Unordered 


U 


FBG 


0110 


Branch on Greater 


G 


FBUG 


0101 


Branch on Unordered or Greater 


GorU 


FBL 


0100 


Branch on Less 


L 


FBUL 


0011 


Branch on Unordered or Less 


LorU 


FBLG 


0010 


Branch on Less or Greater 


LorG 


FBNE 


0001 


Branch on Not Equal 


L or G or U 


FBE 


1001 


Branch on Equal 


E 


FBUE 


1010 


Branch on Unordered or Equal 


EorU 


FBGE 


1011 


Branch on Greater or Equal 


EorG 


FBUGE 


1100 


Branch on Unordered or Greater or Equal 


E or G or U 


FBLE 


1101 


Branch on Less or Equal 


EorL 


FBULE 


1110 


Branch on Unordered or Less or Equal 


E or L or U 


FBO 


1111 


Branch on Ordered 


E or L or G 



Format (2): 



00 a cond 110 disp22 

^ — 29 2S T4 ^-T^ D 



Suggested Assembly Language Syntax | 


fba{,a; 


label 




fbn{,a} 


label 




fbu{.a} 


label 




fbg{,a} 


label 




fbug{,a} 


label 




fbl{,a} 


label 




fbul{,a} 


label 




fblgia} 


label 




fbne{,a} 


label 


synonym: fbnz 


fbe{,a} 


label 


synonym: fbz 


fbue{,a} 


label 




fbge{,a} 


label 




fbuge{,a} 


label 




fble{,a} 


label 




fbulela) 


label 




fbola) 


label 





ii ^ if NOTE ii ii ii 

To set the "annul" bit for FBfcc instructions, append an 
(optional) ",a" to the opcode. For example, use "/b/,a label". 
The preceding table indicates that the ",a" is optional by 
enclosing it in braces ({}). 
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Description: 



An FBfcc instruction (except FBA and FBN) evaluates the floating-point condition codes {fee) 
according to the cond field. If the condition codes evaluate to true the branch is taken and 
the instruction causes a PC-relative, delayed control transfer to the address "PC + (4 * 
sign_ext (disp22))." If the condition codes evaluate to false, the branch is not taken. If the 
branch is not taken and the a (annul) field is set, the delay instruction is not executed 
(annulled). If the branch is taken, the annul field is ignored and the delay instruction is exe- 
cuted. (Annulment, delay instructions, and delayed control transfers are described further in 
the section Instructions.) 

FBN (Branch Never) acts like a "NOP", except that if the annul field is one, the delay instruc- 
tion is not executed (annulled). If the annul field is zero, the delay instaiction is executed. 

FBA (Branch Always) causes a transfer of control, irrespective of the value of the condition 
code bits. If the annul field is one, the delay instruction is not executed (annulled). If the 
annul field is zero, the delay instruction is executed. 

An FBfcc instruction generates an fp_disabled trap (and does not branch on annul) if the 
PSR's EF bit is reset or if the FPU is not present. 

iV it 1* NOTE i^ it if 

Except for FBA, all FBfcc instructions with a=1 annul the delay 
instruction when the branch is not taken. However, FBA with 
a=1 does the reverse: it annuls the delay instaiction even though 
the branch is taken. 

The instruction executed immediately before an FBfcc must not 
be a floating-point instruction. 



B.20.1. Programming Note 

An untaken branch takes as much or more time than a taken branch. The additional time it 
takes is implementation-dependent. 



Traps: 



fp_disabled 
fp_exception 
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B.21. Coprocessor Branch on Condition Instructions 



opcode 


cond 


bp_CP_cc[l:0]test 


CBA 


1000 


Always 


CBN 


0000 


Never 


CB3 


0111 


3 


CB2 


0110 


2 


CB23 


0101 


2 or 3 


CB1 


0100 


1 


CB13 


0011 


1or3 


CB12 


0010 


1 or 2 


CB123 


0001 


1 or 2 or 3 


CBO 


1001 





CB03 


1010 


0or3 


CB02 


1011 


Oor2 


CB023 


1100 


or 2 or 3 


CB01 


1101 


Oorl 


CB013 


1110 


on or 3 


CB012 


1111 


Oorl or 2 



Format (2): 



00 


a 


cond 


111 


disp22 


31 


2d 


28 


24 


21 



Suggested Assembly Language Syntax 


cba{,a; 


label 


cbn{,a} 


label 


cb3{,a} 


label 


cb2{,a} 


label 


cb23{,a) 


label 


cbl{,a} 


label 


cb13{,a} 


label 


cbl2{,a} 


label 


cb123{.a} 


label 


cbO{.a) 


label 


cb03{.a} 


label 


cb02{.a} 


label 


cb023{,a} 


label 


cb01{,a} 


label 


cb0i3{,a} 


label 


cbOl2{,a} 


label 



ii ir ^ NOTE ^ ii ii 

To set the "annur bit for CBccc instructions, append an 
(optional) ",a" to the opcode. For example, use "cb12,a label". 
The preceding table indicates that the ",a" is optional by 
enclosing it in braces ({}). 
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Description: 



A CBcccinstruction (except CBA and CBN) evaluates the coprocessor condition codes (sup- 
plied by the coprocessor on bp_CP_cc[l :0]) according to the contf field. If the condition 
codes evaluate to true the branch is taken and the instruction causes a PC-relative, delayed 
control transfer to the address "PC + (4 * sign_ext (disp22)).*' If the condition codes evalu- 
ate to false, the branch is not taken and the instruction acts like a "NOP." 

If the branch is not taken and the a (annul) field is set, the delay instnjction is not executed 
(annulled). If the branch is taken, the annul field is ignored and the delay Instruction is exe- 
cuted. (Annulment, delay instructions, and delayed control transfers are described further in 
the section Instructions,) 

CBN (Branch Never) acts like a "NOP", except that if the annul field is one, the delay 
instruction is not executed (annulled). If the annul field is zero, the delay instruction is exe- 
cuted. 

CBA (Branch Always) causes a transfer of control, irrespective of the value of the condition 
code bits. If the annul field is one, the delay instruction is not executed (annulled). If the 
annul field is zero, the delay instruction is executed. 

A CBccc instruction generates a cp_disabled trap (and does not branch or annul) if the 
PSR's EC bit is reset or if no coprocessor is present. 

i^ ii ii NOTE ii ^ it 

Except for CBA, all CBccc instructions with a=1 annul the delay 
instruction when the branch is not taken. However, CBA with 
a=1 does the reverse: it annuls the delay instruction even though 
the branch is taken. 

A CBccc instruction must be immediately preceded by a non- 
coprocessor instruction. 



B.21.1. Programming Note 

An untaken branch takes as much or more time than a taken branch. The additional time it 
takes is implementation-dependent. 

Traps: 

cp_disabled 
cp_exception 
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B.22. CALL Instruction 



Fomfiat (1): 



opcode op 



operation 



CALL 



01 



Call 



01 

"ST" 



"^ST 



dJspSO 



Suggested Assembly Language Syntax 



call 



label 



Description: 

The CALL instmction causes an unconditional, delayed, PC-relative control transfer to 
address "PC + (4 * disp30)". Since the word displacement {dispSOj field is 30 bits wide, the 
target address can be arbitrarily distant. The CALL instruction also writes the value of PC, 
which contains the address of the CALL, into oaf register r[15]. 

The PC-relative displacement is formed by appending two low-order zeros to the 
instruction's 30-bit word displacement field. 



B.22.1. Programming Note 

A JMPL instruction with rd= 15 can be used as a register-indirect CALL. 

B.22.2. Programming Note 

The execution time of a CALL instruction may increase If the next instnjction uses rll5] as a 
source operand. Whether this happens is implementation-dependent. 

Traps: 

(none) 
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B.23. Jump and Link instmction 



Fomriat (3): 



opcode op3 



JMPL 



operation 



111000 



Jump and Link 



10 


rd 




op3 


rsl 


i=0 




ignored 


rs2 


31 


29 


24 




id 


13 


12 




4 


10 


rd 




op3 


rsl 


i=1 




simm13 


I 


3i . 


29 


24 




18 


13 


12 








Suggested Assembly Language Syntax 



jmpi 



address, regrd 



Description: 



The JMPL instaiction causes a register-indirect control transfer to an address specified by 
either "rtrsi] + r[rs2]" if the /field is zero, or "r[rs1] + sign_ext(simm13)" if the /field is one. 

The JMPL instruction writes the PC, which contains the address of the JMPL instruction, into 
the destination rrep/s/er specified in the rd field. 

If either of the low-order two bits of the jump address is nonzero, a 
mem_address_not_aligned trap occurs. 



B.23.1. Programming Note 

JMPL with /t/= can be used to return from a subroutine. The typical return address is 
"rt31]+8", if the subroutine was entered by a CALL instnjction. 



B.23.2. Programming Note 

JMPL with rcf= 15 can be used as a register-indirect CALL. 

B.23.3. Programming Note 

The execution time of a JMPL instruction may increase if the next instmction uses rtrd] as a 
source operand. Whether this happens is implementation-dependent. 

Traps: 

mem_address_not_aligned 
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B.24. Return from Trap Instruction 



opcode op3 



RETTt 



t privileged instruction 
Fomiat (3): 



operation 



111001 



Retum from Trap 



I? 


ignored 




0p3 


rsl 


i=0 




ignored 


1 rs2 


31 


29 


24 




ie 


13 


12 




4 


10 
St— 


ignored 

29 


24-- 


0p3 


rsl 

18 


i=1 

13 


^ 


simm13 






Suggested Assembly Language Syntax 



ret! 



address 



Description: 



The RETT Instruction adds one to the CWP (modulo the number of implemented windows) 
and compares this value, the "new^CWP," against the Window Invalid Mask (WIM) register. 
If the WIM bit indexed by the new^CWP is set, "(WIM and 2"®W-.CWP) ^ ^ m ^^^^ ^ 
window_underflow trap is generated. If the WIM bit indexed by the new_CWP is reset, then 
a window_underflow trap is not generated and new_CWP is written into CWP. This causes 
the previous window to become the active window, thereby restoring the window that existed 
at the time of the trap. 

If a window_underflow trap is not generated, RETT causes a delayed control transfer to the 
target address. The target address is either "r[rs1] + r[rs2]" If the /field Is zero, or "r[rs1] + 
sign_ext(simm13)" if the /field is one. Furthermore. RETT restores the S field of the PSR 
from the PS field, and sets the ET field to one. 

If traps are enabled (ET^I), an illegaljnstruction trap occurs. If traps are disabled (ET=0) 
and the processor is not in supervisor mode (8=0), or If a window_underflow condition is 
detected, or if either of the low-order two bits of the target address Is nonzero, a reset trap 
occurs. If a reset trap occurs, the tt field of the TBR encodes the trap condition: 
privilegedjnstruction, window_underflow, or mem_address_not_aligned. 

i^ it ^ NOTE ^ it ^ 

The instruction executed immediately before a RETT must be a 
JMPL instruction. (See discussion in the section "Instructions".) 



B.24.1. Programming Note 

To re-execute the trapped instruction when returning from a trap handler use the sequence: 



jmpl%17, %0 
rett%18 



! old PC 
! old nPC 



To retum to the instruction after the trapped instmction (e.g. when emulating an instruction) 
use the sequence: 
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jinpl%18, %0 ! old nPC 

rett%18 + 4 ! old nPC + 4 



Traps: 

niegaljnstruction 
reset (privilegedjnstmclion) 
reset (mem_address_not_aIigned) 
reset (window_underflow) 
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B.25. Trap on Integer Condition Instruction 



opcode 


cond 


operation 


ice test 


TA 


1000 


Trap Always 


1 


TN 


0000 


Trap Never 





TNE 


1001 


Trap on Not Equal 


notZ 


TE 


0001 


Trap on Equal 


Z 


TG 


1010 


Trap on Greater 


not(Zor(NxorV)) 


TLB 


0010 


Trap on Less or Equal 


Zor(NxorV) 


TGE 


1011 


Trap on Greater or Equal 


not (N xor V) 


TL 


0011 


Trap on Less 


NxorV 


TGU 


1100 


Trap on Greater Unsigned 


not{CorZ) 


TLEU 


0100 


Trap on Less or Equal Unsigned 


(CorZ) 


TCC 


1101 


Trap on Carry Clear (Greater than or Equal, Unsigned)) 


note 


TCS 


0101 


Trap on Carry Set (Less Than. Unsigned) 


C 


TPOS 


1110 


Trap on Positive 


notN 


TNEG 


0110 


Trap on Negative 


N 


TVC 


1111 


Trap on Overtlow Clear 


notV 


TVS 


0111 


Trap on Overflow Set 


V 



Format (3): 


















10 ignored 

31 29 


cond 

28 


24 


111010 


rsl 

ifl 


i=0 
13 


12 


ignored 


rs2 

4 



10 1 ignored \ cond | 111010 rsl i=1 simm13 


31 29 28 24 18 l3 l2 6 



Suggested Assembly Language Syntax 


ta 


address 




tn 


address 




tne 


address 


synonym: tnz 


te 


address 


synonym: tz 


tg 


address 




tie 


address 




tge 


address 




tl 


address 




tgu 


address 




tieu 


address 




tec 


address 


synonym: tgeu 


tcs 


address 


synonym: tlu 


tpos 


address 




tneg 


address 




tvc 


address 




tvs 


address 





Description: 

A Ticc instnjction evaluates the integer condition codes {ice) according to the cond field. If 
the condition codes evaluate to tnje and there are no higher priority traps pending, then a 
trapjnstmction trap is generated. If the condition codes evaluate to false, a trapjnstmction 
trap does not occur. 
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If a trapjnstajction trap is generated, the tt field of the Trap Base Register (TBR) is written 
with 128 plus the least significant seven bits of either "r[rs1] + itrs2]" if the /field is zero, or 
"r{rs1] + sign_ext(simm13)" if the /field is one. 

See the section Traps, Exceptions and Error Handling for the complete definition of a trap. 

Traps: 

trapjnstajction 
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B.26. Read State Register instructions 



opcode op3 



operation 



RDY 
RDPSRt 
RDWIMt 
RDTBRt 



101000 
101001 
101010 
101011 



Read Y register 
Read Processor State Register 
Read Window Invalid Mask register 
Read Trap Base Register 



t privileged instruction 
Format (3): 



10 rd op3 ignored ignored ignored 

51 — TS ^i ^IB T3 ^2 D 



Suggested Assembly Language Syntax 



rd 
rd 
rd 
rd 



%y. regrd 
%psr, regrd 
%wim, regrd 
%tbr, regrd 



Description: 

These instructions read the specified lU state registers into the r register specified in the rd 
field. 



B.26.1. Programming Note 

The execution time of any of these instructions may Increase if the next instaiction uses the 
register specified by the rcf field of this instruction as a source operand. Whether it does or 
not is implementation-dependent. 

Traps: 

privilegedjnstnjction (RDPSR, RDWIM and RDTBR only) 
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B.27. Write State Register instructions 



t privileged instruction 



opcode op3 operation 


WRY 
WRPSRt 
WRWIMt 
WRTBRt 


110000 
110001 
110010 
110011 


Write Y register 
Write Processor State Register 
Write Window Invalid Mask register 
Write Trap Base Register 


Ction 








Suggested Assembly Language Syntax 






wr regrsi, reg_orJmm, %y 
wr reQrsu reg_orJmm, %psr 
wr regrsh reg_or_imm, %wim 
wr regrsi, reg_or_imm, %tbr 





Format (3): 



10 


ignored 




Op3 


rsl 


i=0 




ignored 


rs2 


31 


29 


24 




18 


13 


12 




4 


10 

31 


ignored 

29 


24 


Op3 


rsl 

1B 


i=1 

13 


12 


simm13 


1 





Description: 



These instructions write either "r[rs1] xor r[rs2]" if the / field is zero, or "r[rs1] xor 
sign_ext(simml3)" if the /field is one, to the writeable subfields of the specified lU state 
register. 

WRPSR does not write the PSR and causes an illegaljnstruction trap if the result would 
cause the CWP field of the PSR to point to an unimplemented window. 

These instructions are delayed-write instructions: 

1 . If any of the three instructions after a WRPSR uses any field of the PSR that is changed 
by the WRPSR, the value of that field is unpredictable. (Note that any instruction which 
references a non-global register implicitly uses the CWP.) 

2. If a WRPSR instmction is updating the PSR's PIL to a new value and is simultaneously 
setting ET to 1, this can result in an interrupt trap at a level equal to the old value of the 
PIL 



B.27.1. Programming Note 

Two WRPSR instructions should be used when enabling traps and changing the value of the PIL. 
The first WRPSR should specify ET=0 with the new PIL value, and the second WRPSR should 
specify ET=1 and the new PIL value. 

3. If any of the three instoictions after a WRWIM is a SAVE. RESTORE or RETT, the 
occurrence of window_overflow and window_underflow traps is unpredictable. 

4. if any of the three instructions that follow a WRY is a MULScc or RDY, the value of Y 
used is unpredictable. 

5. If any of the three instructions that follow a WRTBR causes a trap, the trap base 
address (TBA) used may be either the old or the new value. 
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6. If any of the three instructions after a write state register instruction reads the nrodified 
state register, the value read is unpredictable. 

7. If any of the three instructions after a write state register instnjction is trapped, a subse- 
quent read state register instruction in the trap handler will get the register's new value. 



Traps: 



privilegedjnstruction (WRPSR, WRWIM and WRTBR only) 
illegaljnstruction (WRPSR only) 
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B.28. Unimplemented Instruction 



Format (2): 



opcode op op2 operation 



UNIMP 



00 



000 



Unimplemented 



00 I ignored \ 000 

"Si ^ ^4 



const22 



TT 



Suggested Assembly Language Syntax 



unimp 



const22 



Description: 

The UNIMP instructfon causes an illegaljnstruction trap. The const22 value is ignored. 

B.28.1. Programming Note 

This Instmction can be used as part of the protocol for calling a function that is expected to 
return an aggregate value, such as a C-language stmcture. See Appendix Dior an example. 

a) An UNIMP instruction is placed after (not in) the delay slot after the CALL instruction in 
the calling function. 

b) If the callee function is expecting to return a structure, it will find the size of the structure 
that the caller expects to be returned as the con$t22 operand of the UNIMP instruction. 
The callee can check the opcode to make sure it is indeed UNIMP. 

c) If the function is not going to return a stmcture. upon returning it attempts to execute the 
UNIMP instruction rather than skipping over it as it should. This causes the program to 
terminate. This behavior adds some run-time type checking to an interface that cannot 
be checked properly at compile time. 

Traps: 

illegaljnstruction 
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B.29. Instmctlon Cache Flush Instruction 



opcode op3 



operation 



FLUSH 



111011 



Instruction cache Flush 



Format (3): 



I? 


ignored 




0p3 


rs1 


i=0 




ignored 


1 rs2 1 


31 


2d 


24 




16 


13 


12 




4 


10 

"9i ■" 


ignored 
2d 


24 


Op3 


rsl 

16 


i=1 

13 


12 


simm13 


6 



Suggested Assembly Language Syntax 



Iflush 



address 



Description: 

The IFLUSH instmction causes a word to be flushed from an Instruction cache that may be 
Internal to the processor. The address of the word to be flushed is either "ifrsl] + r[rs2]" If 
the /field is zero, or *'r[rs1] + sign_ext(simm13)" if the /field is one. 



B.29.1. Implementation Note: 

If there Is no instruction cache internal to the processor, IFLUSH acts as a "NOP." If there is 
an internal instmction cache, IFLUSH flushes the addressed word from the cache. If there is 
an external instruction cache. IFLUSH causes an illegaljnstmction trap. The presence of 
an external instnjction cache is determined by the bpjjoachejpresent signal. 

Traps: 

illegaljnstruction 
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B.30. Floating-point Operate (FPop) Instructions 



opcode op3 



FPop1 
FPop2 



operation 



110100 
110101 



Floating-point operate 
Floating-point operate 



Format (3): 












10 


rd 


110100 


rsl 




opf 1 rs2 


3i 


2d 24 




18 


13 


4 


10 rd 


1 110101 


rsl 




opf 


1 rsz 1 


3i 2§ 


24 


18 


13 




4 



The Floating-point Operate (FPop) instructions are encoded using two type 3 instmction formats 
called FPopI and FPop2. The floating-point operations themselves are encoded by the opf field. 
(Note that the load/store floating-point instructions are not "FPop" instructions.) 

All FPop instmctions take all operands from and return all results to f registers and/or the FSR. 
They perfomi operations on ANSI/IEEE 754-1985 single, double, and extended fonnats (see the 
section SPARC Architecture Overviev\^, 

All multiple-precision floating-point instructions (including load/store floating-point) assume that 
operands are tocated in register pairs (for double precision) or quadruples (for extended preci- 
sion). The following table indicates the alignment assumptions. Note that single-precision 
operands can be in any freg/sfer. 



operand f register address 



double-e 
double-f 



mod 2 

1 mod 2 



extended-e mod 4 

extended-f 1 mod 4 

extended-f-low 2 mod 4 

extended-u 3 mod 4 



According to this convention, the least significant bit of an f register address is ignored by 
double-precision FPops and the least significant two bits of an f register address are ignored by 
extended-precision FPops. 

A program including floating-point computations generates the same results as if all instructions 
were executed sequentially (assuming it runs to completion). Note that floating-point loads and 
stores are not floating-point operate instructions. 

Results are written (or traps are caused) in the order that FPops are encountered in the 
instruction stream. The section Instructions explains this in more detail. An FPop instruction 
causes an fp_disabled trap if the EF field of the PSR is or if no FPU is present. 
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B.30.1. Convert Integer to Floating-point instructions 



Format (3): 





opcode 


opf operation 






FiTOs 
FiTOd 
FiTOx 


011000100 
011001000 
011001100 


Convert Integer to Single 
Convert Integer to Double 
Convert Integer to Extended 




3): 








10 rd 

31 2d 


24 


110100 ignored 1 opf 1 


rs2 1 

4 



Suggested Assembly Language Syntax 



fitos 
fitod 
fitox 



fregrs2^ fregrd 
fregrs2* fregrd 
fregrs2* fregrd 



Description: 

These instructions convert the 32-bit integer argument in the freg/sfer specified by rs2 into a 
floating-point number in the destination format according to the ANSI/IEEE 754-1985 
specification. They place the result in the destination fregister(s) specified by rd. 

For FiTOs and FiTOx with single-precision rounding, rounding is performed according to the 
rounding direction (RD) field of the FSR. 



Traps: 



fp_disabled 

fp_exception (NX) (FiTOs and FiTOx when RP=single) 
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B.30.2. Convert Floating-point to Integer 



Format (3): 





opcode 


opf 


operation 






FsTOi 
FdTOi 
FxTOi 


011010001 
011010010 
011010011 


Convert Single to Integer 
Convert Double to Integer 
Convert Extended to Integer 




3): 










10 rd 

31 2B 


24 


110100 


ignored opf | 

lfl 13 


rs2 

4 e 



Suggested Assembly Language Syntax 



fstoi 
fdtoi 
fxtoi 



fregrs2. fregrd 
fregrs2. fregrd 
fregrs2. fregm 



Description: 

These instructions convert the floating-point source argument in the f register or f registers 
specified by rs2 to a 32-bit integer (in the fre^/sfer specified by the rcf field) according to the 
ANSI/IEEE 754-1985 specification. 

The floating-point argument is rounded toward zero and the ndfield of the FSR is ignored. 

Traps: 

fp_disabled 
fp_exception (NV, NX) 
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B.30.3. Convert Between Floating-point Formats Instructions 



opcode 


opf 


operation 


FsTOd 
FsTOx 
FdTOs 
FdTOx 
FxTOs 
FxTOd 


011001001 
011001101 
011000110 
011001110 
011000111 
011001011 


Convert Single to Double 
Convert Single to Extended 
Convert Double to Single 
Convert Double to Extended 
Convert Extended to Single 
Convert Extended to Double 



Format (3): 



10 


rd 
•-59 


h?3- 


110100 


1 ignored 

18 


-TS— 


opf 


rs2 

4 



Suggested Assembly Language Syntax 


fstod 


fregrs2. 


fregrd 


fstox 


fre9rs2. 


fregrd 


fdtox 


fregrs2, 


fregrd 


fdtox 


fregrs2> 


fregrd 


fxtod 


fregrs2. 


fregrd 


fxtos 


fregrs2. 


fregrd 



Description: 

These Instructions convert the floating-point source argument in the f register or f registers 
specified by rs2 to a floating-point number in the destination fomiat according to the 
ANSI/IEEE 754-1985 specification. They place the result in the f register or / registers 
specified by rd. 

Rounding is performed according to the rounding direction (RD) field of the FSR. In the case 
of FdTOx, the outcome is also a function of the rounding precision (RP) field. 



Traps: 



fp_disabled 

fp.exception (OF. UF, NV. NX) 
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B.30.4. Floating-point Move instructions 



opcode 



opt 



operation 



FMOVs 
FNEGs 
FABSs 



000000001 
000000101 
000001001 



Move 
Negate 
Absolute Value 



Format (3): 



10 
'■31" 


rd 

^Tg 


■ -24 ■ 


110100 


ignored 

i6 


-T3- 


opf 


1 rs2 

' 4 



Suggested Assembly Language Syntax 



fmovs 
fnegs 
fabss 



fregrs2^ freQrd 
fregrs2* fregrd 
fregrs2^ fregrd 



Description: 

FMOVs moves a word from f[rs21 to f[rd]. Multiple FMOVs's are required to transfer a 
multiple-precision number between f registers. 

FNEGs complements the sign bit, and FABs clears it. 

These instaictions do not round. 



B.30.5. Programming Note 

FNEGs or FABSs instaictions can also operate on the high-order words (the word that con- 
tains the sign bit) of double and extended operands. Thus an FNEGs instruction and an 
FMOVs instruction would be used to negate a double and put the results in a different pair of 
(registers. 

Traps: 

fp_disabled 
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B.31. Floating-point Square Root instructions 



opcode 



opf 



operation 



FSQRTs 
FSQRTd 
FSQRTx 



000101001 
000101010 
000101011 



Square Root Single 
Square Root Double 
Square Root Extended 



Fomiat (3): 



10 

5i 


rd 

2d 


■24 


110100 


1 ignored 

18 


13 


opf 


fS2 

4 t 




Description: 

These Instructions generate the square root of the floating-point source argument in the f 
register or f registers specified by rs2 according to the ANSI/IEEE 754-1985 specification. 
They place the result in the destination f register or f registers specified by the nrf field. 

Rounding is performed according to the rounding direction (RD) field of the FSR. in the case 
of FSQRTx, the outcome is also a function of the rounding precision (RP) field. 



Traps: 



fp_disabled 
fp^exception (NV. NX) 
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B.31.1. Floating-point Add and Subtract Instructions 







opcode opf operation 






FADDs 
FADDd 
FADDx 


001000001 
001000010 
001000011 


Add Single 
Add Double 
Add Extended 






FSUBs 
FSUBd 
FSUBx 


001000101 
001000110 
001000111 


Subtract Single 
Subtract Double 
Subtract Extended 




Format (3): 








10 


rd 110100 rsl | opt 


1 rs2 - 


31 


29 


24 IB 13 


4 






Suggested Assembly Language Syntax 






fadds fregrsi, fregrs2, fregrd 
faddd fregrsi, fregrs2, fregrd 
faddx fregrsi, fregrs2, fregrd 






fsubs fregrsi, fregrs2, fregrd 
fsubd fregrsi, fregrs2, fregrd 
fsubx fregrsi, fregrs2, fregrd 




Description: 













These instructions add or subtract their operands according to the ANSI/IEEE 754-1985 
specification, and place the result in the f register ox f registers specified in the rcf field. The 
subtract instaictions subtract the floating-point value specified by rs2 from the one specified 
by rs1. 



Traps: 



fp_disabled 
fp^exception (OF. UF. NX) 
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B.31.2. Floating-point Multiply and Divide instructions 



Format (3): 



Description: 



opcode opf operation 


FMULs 
FMULd 
FMULx 


001001001 
001001010 
001001011 


Multiply Single 
Multiply Double 
Multiply Extended 


FDIVs 
FDlVd 
FDIVx 


001001101 
001001110 
001001111 


Divide Single 
Divide Double 
Divide Extended 



10 
TT — 


rd 

25 


'tt- 


110100 


rsl 


13' ■■ 


opt 


rs2 

4 



Suggested Assembly Language Syntax 



fmuls 
fmuld 
fmulx 



fregrsu 
fregrsh 



^regrsz, fregrd 
fregrs2, fregrd 
fregrs2, fregrd 



fdivs 
fdivd 
fdlvx 



fregrsh 
fregrsi. 
fregrsi. 



fregrs2. fregrd 
fregrsz. fregrd 
fregrs2, fregrd 



These instructions multiply or divide their operands according to the ANSI/IEEE 754-1985 
specification, and place the result in the f register ox f registers specified in the /cf field. The 
divide instmclions divide the floating-point value specified by rsl by the one specified by rs2. 



Traps: 



fp_disabled 

fp_exception (OF. UF, DZ (FDIV only), NV, NX) 
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B.31.3. Floating-point Compare Instructions 



opcode 


opt 


operation 


FCMPs 


001010001 


Compare Single 


FCMPd 


001010010 


Compare Double 


FCMPx 


001010011 


Compare Extended 


FCMPEs 


001010101 


Compare Single and Exception if Unordered 


FCMPEd 


001010110 


Compare Double and Exception if Unordered 


FCMPEx 


001010111 


Compare Extended and Exception If Unordered 



Format (3): 



10 

3i" 


ignored 

2d 


■■-24 ■ 


110101 


rsl 


i3 


opf 


1 rs2 

' 4 



Suggested Assembly Language Syntax 


fcmps 
fcmpd 
fcmpx 


fregrsu fregrs2 

frOQrsU fregrs2 

fregrsu fregrs2 


fcmpes 
fcmped 
fcmpex 


fregrsu fregrs2 
fregrsu fregrs2 
fregrsu fregrs2 



Description: 

These instructions compare their operands according to the ANSI/IEEE 754-1985 
specification. The floating-point condition codes in the FSR are set as follows: 

NOTE: 

This table is a duplicate of Table 3-5 in the section "Registers". 



fee Relation 



fs1 = fs2 

1 fs1 < fs2 

2 fs1 > fs2 

3 fsl ? fs2 (unordered) 



In this table. fs1 refers to the value specified by the rsl field and fs2 refers to the value 
specified by the rs2 field of the compare instruction. 

The "Compare and Cause Exception if Unordered" instructions (FCMPE) cause an invalid 
exception (NV) if either of the operands is a signaling or quiet NaN. FCMP also causes an 
invalid exception if either operand is a signaling NaN. 

if ^ ii NOTE i^ ii ^ 

A non-floating point instmction must be executed between an 
FCMP and a subsequent FBfcc. 

Traps: 

fp_disabled 
fp_exception (NV) 
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B.32. Coprocessor Operate Instructions 



Fomiat (3): 



opcode op3 



operation 



CPop1 
CPop2 



110110 
110111 



Coprocessor Operate 
Coprocessor Operate 



:° 


rd 


110110 


rsl 


opc 


rs2 1 


3i 


29 


24 




16 


13 




4 


10 


rd 


110111 


rsl 


opc 


rs2 1 


3i 


2d 


24 




18 


13 




4 



ii ii ^ NOTE it ii it 

The assembly language syntax for these Instructions is 
unspecified. 

The Coprocessor Operate (CPop) instructions are encoded via two type 3 instmction formats 
called CPopI and CPop2. The coprocessor operations themselves are encoded by the opc field 
and are coprocessor-dependent. (Note that the load/store coprocessor instnjctions are not 
"CPop" instmctions.) 

All CPop instructions take all operands from and return all results to coprocessor registers. The 
data types supported by the coprocessor are coprocessor-dependent. Operand alignment is 
coprocessor-dependent. 

A CPop instmction causes a cp_disabled trap if the EC field of the PSR is or if no coprocessor 
is present. 

Whether a CPop generates a cp.exception trap is coprocessor-dependent. 
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C.I. Introduction 

This appendix provides a description of the SPARC architecture using the Instoiction-Set Proces- 
sor (ISP) description language. It includes register definitions* instruction fields, processor 
states, instruction dispatch, traps, and instruction descriptions. 

The instruction interpreter defines the ordering of events. Except for a few cases (which are 
documented), the interpreter together with the instmction and register definitions provide a sup- 
plemental description of the processor. 

Note that the use of a particular variable in the notation does not necessarily imply that its related 
signal is present in an implementation, or visible to the programmer. 

The instnjction description language is a modified version of Bell and Neweirs ISP instmction 
description language, which was created to accurately describe computer instruction sets. While 
the semantics are somewhat intuitive, the following guidelines provide important details: 

The only data type is the bit vector. Variables are defined as bit vectors of particular widths, 
declared as variable<n:m>. Variable subfields can be defined, also with the <n:m> nota- 
tion. The value of a vector is a number in a base indicated by its subscript. The default 
base is decimal. Anrays of vectors are declared as arTay[n:m]. 

The notation ^ indicates variable assignment, and :s indicates a macro definition. 

When a bit vector is assigned to another of greater length, the operand is right-justified in the 
destination vector and the high-order positions are zero-filled. The macro zero.extend is 
sometimes used to make this clear. Conversely, the macro sign_extend causes the high- 
order positions of the result to be filled with the highest-order (sign) bit of its operand. 

The semicolon *;* separates statements. Parentheses '()' group statements and expressions 
that could othenA^ise be interpreted ambiguously. 
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All statements are generally executed "simultaneously." However, if the term next appears, 
it indicates that the statement or statements which follow the next are executed after those 
that appear before the next. Thus, all statements between next phrases are executed con- 
cunrently. More precisely, this means that all expressions on the right hand sides of assign- 
ments located between next's are evaluated first, after which the variables on the left hand 
sides are updated. (This convention emulates synchronous, clocked hardware.) 

For example, if A^O and B^O, execution of the following two statements, 

A ^ B+1; 
B f- A+1; 

results In A=1 and B=1 . However, 

A <- B4-1; 

next ; 

B 4- A+1; 

results in A«1 and 8=2 .» 

The symbol Q designates concatenation of vectors. A comma *,' on the left side of an 
assignment separates quantities that are concatenated for the purpose of assignment. For 
example, if the 2-bit vector T2 equals 3, and X, Y, and Z are 1-bit vectors, then: 

Xr Y, Z ir- oDt2 

results in X=0, Y=1 , and Z=1 . 

The operators V and *- perform two's complement arithmetic. 

The phrase fork, used only in the instmction interpreter for the FPop instmctions, indicates 
that the associated routine may be executed concurrently with a// other subsequent state- 
ments. There is no notation for rejoining: after the forked routine executes its last statement, 
it terminates. 

The major difference between the notation used here and the 1971 version of ISP is that the 
notation here uses the more common: 

if cond then SI else 32 

whereas Bell and Newell used the following: 

(cond —> 31, -1 cond — > 32) 

The macros memory_read and memory_write, are implementation-dependent. These rou- 
tines define the interface without referring to implementation-specific signals: 

load_data «- inemory_read{addr_space, address) 

memory_write (addr_space, address, bytejmask, 
store_data) 

Memory_read returns the word in memory specified by both the address and the address 
space kJentifier. 

Memory_write writes all or part of the word store_data into the word specified by the given 
address. If there is an exception, memory_write does not change the state of the external 
system or the MhAU, Byte_mask is a 4-bit value that indicates which of the four bytes in 
store data are to be written into the addressed word. 
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CJ2. Register Definitions 



PSR<31:0>: 
imp! 
ver 
he 

N 

Z 

>/ 

c 

fBserved 

EC 

EF 

PIL 

S 

PS 

ET 

CWP 



:mPSR<31:28>, 
:m P$R<27:24>, 
:mP$R<23:20>, 

> PSR<23>, 
:. P$R<22>, 

> P$R<21>. 

> PSR<20>, 
;« PSR<19:14>, 
:mP$Rcl3>; 

> PSR<12>: 
:m:PSR<11:8>; 

> PSR<7>; 

> PSR<6>: 

> PSR<5>: 

> PSR<4:0>: 



{Prooessor State Register} 



TBR<31.'0>: 
TBA 
tt 
zero 



■.TBR<31:12>: 
:TBR<11:4>; 
: TBR<3:0>: 



(Trap Base Register} 



FSR<31X)>: 
RD 
RP 
TEM 
NVM 
OFM 
UFM 
DZM 
NXM 
AU 



{Fhatir^g-Point State Register} 



rFSR<31:30>: 
: FSR<29:28>: 
:FSR<27:23>: 

> FSR<27>: 
;« FSR<26>: 

> F$R<25>: 

> FSR<24>: 

> FSR<23>; 

> FSR<22>; 



reserved 

ftt 

qne 

reserved 

fee 

aexc 

nva :« 

ofa > 

Ufa :» 

dza ;> 

nxa ;« 

cexc 

nvc :> 

ofc > 

ufe > 

dzc > 

nxc :« 



r FSR<9>: 
FSR<8>; 
FSR<7>: 
r FSR<6>: 
: FSR<5>; 

FSR<4>: 
FSR<3>: 
FSR<2>: 
FSR<1>; 
FSR<0>: 



'•FSR<21:17>; 
. FSR<16:14>: 
: FSR<13>: 
'.FSR<12>: 
^FSR<11:10>; 
'• FSR<9:5>: 



> FSR<4:0>: 



CSR<31:0>: 
WIM<31:0>; 



{CP State Register} 

{Window Invalid Mask Register} 
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Y<3W>: 

PC<3i.i)>: 
nPC<3iX)>: 

FQ<63:0>; 

CQ<63,'0>: 

Gl1:7]<31,i)>: 

R[0:(16*NWtNDOWSh 1]<31:0>: 

f[0,-31]<31X>>: 



{YRegistBfJ 
{Program Counter} 
{Next Program Counter) 
{Fhating-Point Queue} 
{Coprocessor Queue} 
{Global Registers} 
{Windowed Registers} 
{Floating-Point Registers} 



r[n] :« if (n * 0) 
then 

else if (1 !S n S 7) 
then G[n] 
else R((n-8) + (CWP*16)] 



{globals} 
{windowed registers} 



C.3. System Interface Definitions 

bp_IRL<3:0>; 
bp_reset_in; 
pb_error; 
pb_retain_bus ; 
bp_FPU_present ; 
bp_CP_jpresent ; 
bp_I_cache_present ; 
bp_CP_except ion ; 
bp_CP_cc <1:0>; 
bp_jnemory_except ion ; 



C.4. Instruction Fields 

The numbers in braces are the widths of the fields in bits. 



instruct ionOl : 0> 



op 


{2} 


op2 


(3) 


op3 


{6} 


opf 


{9} 


ope 


{9} 


asi 


{8} 


i 


{1} 


rd 


{5} 


a 


{1} 


cond 


{4} 


rsl 


{5} 


rs2 


{5} 


siinml3 


113) 


shcnt 


{5) 


disp30 


{30} 


disp22 


{22} 



instructionOl : 30> 
instruction<24:22> 
instruction<24:19> 
instruction<13 : 5>; 
instruction<13:5>; 
instruction<12 : 5>; 
instruct ion<13>; 
instruction<29:25>; 
instruction<29>; 
instruction<28:25>; 
instruction<18 : 14>; 
instruction<4 : 0>; 
instruction<12 : 0>; 
instruction<4 : 0>; 
instruction<2 9:0>; 
instruction<21 : 0>; 
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C.5. Processor States and instruction Fetch 

The lU can be in one of three states: execute_mode. reset_mode, or enx)r_mode. 

The FPU can be in one of five states: reset_mode. enor^mode, fpu_execute_mode. 
fpu_exceptionj)ending_mode, or fpu_exception_mode. The FPU's reset_mode and enor_mode 
con'espond to the lU's reset and error modes. The remaining FPU states are described in Sec- 
tion C.6. 

The processor (that is the lU and FPU) is in reset_mode when bp.resetjn is asserted. The pro- 
cessor remains in reset_mode until bp_resetjn is deasserted, at which point the lU enters 
execute_nx>de and the FPU enters fpu_execute_mode. 

When bp_resetjn is deasserted, the first instruction address is 0. with ASI=9 (supervisor instruc- 
tion). 

The processor enters error_mode from any state except reset_mode if a synchronous trap is 
generated while traps are disabled. (See the section Traps, Exceptions, and Error Handling). 
5.) The processor remains in error_mode until bp_resetjn is asserted, at which time It enters 
reset mode. 



C.5.1. Implementation Note 

The external system should assert bp_resetjn whenever pb_error\s detected. 

The following ISP code defines the three lU states. In execute_mode, the lU fetches and 
dispatches instructions. 
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while (reset_mode) ( 

if (bp_reset_in « 0) then ( 
reset jmode <- 0; 
execute_mode <~ 1; 
trap f- 1; 
reset <- 1 
) 
); 

addr_space :« S«0 then 8 else 9; 

while (execute_mode) ( 

check_^interrupts; { see Section C.8} 

next; 

{ the following code emulates the delayed nature of the 
write state register instructions.} 

PSR <- PSR' ; PSR' <- PSR' ' ; PSR' ' <- PSR' ' ' ; PSR' ' ' ^ PSR' '" ; 
TBR ^ TBR' ; TBR' <- TBR' ' ; TBR' ' 4- TBR' ' ' ; TBR' ' ' <- TBR' ' " ; 
WIM <- WIM' ; WIM' ^ WIM" ; WIM' ' 4- WIM' ' ' ; WIM' " <- WIM" ' ' ; 
y <~ Y' ; y <- Y" ; Y' ' <~ Y' ' ' ; Y' " <- Y' " ' / 
next ; 

if (trap = 1) then 

execute_trap/ { see Section C.Bl 

next ; 

instruction <— inemory__read(addr_spacer PC); 
next ; 

if (bp_memory_exception « 1) then ( 
trap <- 1; 

instruction_access_exception <— 1 
) else ( 

if (annul = 0) then ( 

dispatch_instruction { see Section C,5 ) 
) else ( 

annul «- 0; 
PC <- nPC; 
nPC 4- nPC + 4 
) 
) 



while (errorjmode) ( 

if (bp_reset_in = 1) then 
error mode <— 
reset mode <— 1 
pb error <- 



) 
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); 



C.6. Instmction Dispatch 

The "dispatchjnstnjction" macro determines if the fetched instructton is an FPop or CPop. If it is 
an FPop, it is executed by the "execute.FPUJnstnjction" nnacro (Section C.6) as soon as the 
FPU can accept another instruction. If the fetched instnjction is a CPop, it is executed by the 
"execute.CPJnstruction" macro (Section C.7) as soon as the CP can accept another instruction. 

If the instruction is neither an FPop or a CPop, it is executed by the "executeJUJnstmction" 
macro, which includes all the macro definitions in Section C.9 (except for FPop and CPop). 

Unused bit pattems in the op. op2, op3, opf» and /fields of instnjctions cause illegaljnstruction 
traps. Other fields that are defined to be unused are ignored and do not cause traps. 

The macro Itoating-pointjnstr' returns a 1 if the instruction is a floating-point instruction. Simi- 
larly, the macro *coprocessorjnstr' returns a 1 if the instruction is a coprocessor instruction. 
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uniinplemented_IU_instr :« ( 

if ( ( (op«002) and (op2«0002) ) [UNIMP instruction} 

or 
( ((op'll^) or (op^lOj) ) and (op3«unassigned) ) 

or 
( (i - 1) and 

(LDSBA or LDSHA or LDUBA or LDUHA or LDA or 
LDDA or STDA or LDSTUBA or SWAPA 
STBA or STHA or STA 
) 
) then 1 else 
); 

floating_point_instr :« { 

if (LDF or LDDF or LDFSR or 

STF or STDF or STFSR or STDFQ or 

FPopl or FPop2 or FBfcc) then 1 else 
); 

coprocessor_instr: « ( 

if (LDC or LDDC or LDCSR or 

STC or STDC or STCSR or STDCQ or CPopl or CPop2 or CBccc) then 1 else 

); 

dispatch_instruction := ( 

if (unimpl_IU__instr = 1) then ( 
t rap 4- 1 ; 

illegal_instruction <- 1 
); 

if (f loating-point__instr = 1) then ( 
if (EF ^ 0) then ( 
trap <- 1; 
fp_disabled <— 1 
) else ( 

if ( fpu_exception__pending_roode = 1 ) then ( 
fpu__exception_pending_mode <— 0; 
fpu_exception_inode 4—1; 
trap <- 1 
); 

while ( (fp__not_ready - 1) and (trap « 0) ) 
check__interrupts; 

) 

); 

if (coprocessor_instr = 1) then ( 
if (EC = 0) then ( 

trap ♦-I; 

cp_disabled 4- 1 
) else ( 

check__CP__exception; 

next; 

while ( (cp__no thready = 1) and (trap « 0) ) ( 
chec)c_interrupts; 
) 

); 
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next; 

if (trap = 0) then 

if (FPopl or FPop2) then fork execute__FPU_instruction 
else if (CPopl or CPop2) then fork execute_CP_instruction 
else execute_IU_inst ruction 

); 



) 



cxecute__IU_inst ruction :«^ ( 

{ do routine for specific instruction, defined below } 

next; 

if (trap « and 

not (CALL or RETT or JMPL or Bice or FBfcc or CBccc or Ticc) ) then ( 
PC 4- nPC; 
nPC4~ nPC + A 
) 
); 

execute_FPU_inst ruction := ( 

if (FPU_exception_mode)_ then ( 

ftt <- sequence__error; 

FPU_exception_mode <- 0; (see following discussion} 

FPU_exception_pending_mode <- 1 
) else ( 

enqueue_FQ (instruction, PC) 

( execute description defined below } 
) 

); 



C-9 ISP Descriptions C-9 



Solbourne Computer, Inc. 

C.7. Floating-Point Instruction Execution 

TTie FPU can execute ftoating-point operate (FPop) instmctions concun-ently with other FPops 
and with non-floating-point instructions. To do this, it maintains a Floating-point Queue (FQ) of 
FPop Instructions pending completion, and can force the lU to wait until resource and data 
dependencies have been resolved. 

The architecture ensures that a program containing FPops generates the same numerical results 
as if there were no concurrency. 

After the FPU begins to execute an FPop, the lU continues to fetch and execute instructions until 
one of five "hold" conditions occurs. Any one of these causes the lU to stop fetching instructions 
until the condition is no tonger true: 

1) If, for a load floating-point register instmction, the destination f register is the source or desti- 
nation register of an executing FPop, the lU waits until the executing FPop no longer 
requires the register. 

2) If, for a store floating-point register instruction, the source f register \s the destination register 
of an executing FPop. the lU waits until executing FPop no longer require the register. 

3) A load or store ftoating-point state register instruction (LDFSR, STFSR) causes the lU to wait 
until all executing and pending FPops have completed. 

4) A branch on floating-point condition (FBfcc) instruction causes the lU to wait until any exe- 
cuting or pending floating-point compare instructions (FCMP, FCMPE) have finished. 

5) When the lU encounters an FPop, it stops fetching instructions until the FPop has been 
accepted by the FPU. 

C.7.1. Floating-Point Queue (FQ) 

The floating-point queue (FQ) has at least one entry for each of the FPU's arithmetic units that 
can execute in parallel with other arithmetic units. The depth of the queue is implementation- 
dependent. 

Each entry in the queue (for the purposes of the definition in this appendix) contains 1) the FPop 
instnjction itself, 2) the PC from which the FPop was fetched, 3) an indication of the arithmetic 
unit executing it, 4) a completion status bit that indicates whether the operation finished properly, 
and 5) a temporary result, including any exceptions or condition codes generated by the instmc- 
tion. Parts (1) and (2) of the front entry are visible to the programmer using the STDFQ instruc- 
tton; the other parts and the other entries are invisible to the programmer. 

(Note that toad floating-point, store floating-point, and FBfcc instructions are never entered in the 
queue.) 

For the purposes of the definition in this appendix, when an arithmetic unit finishes, it deposits its 
computed result, any exceptions or conditions it may have generated, and a completion status 
bit, into the reserved location in the queue. As FPops complete, each entry moves toward the 
front of the queue (if it is not already there). 

The FPU can stop executing an FPop in one of four ways: 1) completed without exception (nor- 
mal), 2) IEEE_exception, 3) unfinished_FPop, or 4) unimplemented_FPop. The following para- 
graphs describe each: 

Nonmal Completion 

if the FPop represented by the front entry in the queue caused no unmasl<ed exceptions, the 
FPU 1) writes the result into the f register(s) specified by the rd field of the Instruction (if 
any), 2) updates the FSR's cexc and fee fields, 3) removes the entry from the queue, and 4) 
advances the queue. 



C-10 ISP Descriptions C-10 



Solbourne Computer. Inc. 



IEEE_Exception 

If the FPop pointed to by the front entry in the queue caused an tEEE_exception trap, the 
FPU updates the FSR's cexc and fit fields to identify the exception, and does not write the 
result Into the fregister($) specified by the rof field of the instruction, nor does it remove the 
entry from the queue. However, if an IEEE_exception does not result In a fp.exception trap, 
all results are written, Including the destination f register, cexc, aexc, and fee. 

Unlmplemented_FPop or Unfinished.FPop 

If the FPop pointed to by the front entry in the queue is not implemented, or If the arithmetic 
unit was unable to complete it according to the ANSI/IEEE 754-1985 specification (for exam- 
ple, a multiply unit may not be able to postnormalize a denormalized result or handle a NaN). 
the FPU updates the ftt field of the FSR to identify the exception, and does not write the 
result into the fregister(s) specified by the rcf field of the instruction, nor does it remove the 
entry from the queue. The front entry in the queue identifies the FPop that generated the 
floating-point exception trap. 

C.7.2. FQ_Front_Done 

The Implementation-dependent macro 'FQJront_done* retums a 1 if an arithmetic unit has 
finished processing the FPop at the front of the FQ. The implementation-dependent macro 
'stop_FPU' stops all cun-ent processing of FQ entries. 



C.7.3. FPU States 

The FPU can be in any of three modes: FPU_execute_mode, FPU_exceptionj>ending_mode. or 
FPU_exception_mode. In FPU_execute_mode, it executes floating-point instmctions. 

The FPU enters the FPU_exception_pending_mode state when an FPop instruction causes an 
IEEE_exception, unfinished_FPop exception, unimplemented_FPop exception, or a 
sequence_error. The FPU remains in FPU_exceptionjDending_mode until the lU fetches 
another floating-point instmction, at which time a fp_exception trap is caused and the FPU enters 
the FPU_exception_mode state. 

In FPU_exception_mode, the FPU executes only store floating point Instructions. If an FPop or a 
load floating point instruction is fetched while the unit is in FPU_exception_mode. the /iff field of 
the FSR will be updated to indicate "sequence^en'or", and the FPU will enter 
FPU_exceptionj>ending_mode. The instruction that caused the sequence_enror Is not entered 
into the FQ. 

The FPU retums to FPU_execute_mode after the FQ has been emptied via STDFQ Instmctions, 
that is. qne is 0. 
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while (FPU__execute_mode) { 

if (FQ_front_done « 1) then ( 

if (fp_un implemented « 1) then ( (not implemented) 
fp_exception 4-1; ftt ♦- unimplemented__FPop; 

); 

if (FQ_c = 0) then { (not finished) 

fp_exception <- 1; ftt 4- unfinished_FPop; 

) else ( {executed and finished) 

cexc 4- texc; 

next ; 

if ( cexc and TEM t 0) then ( {floating-point trap) 

fp_exception ♦- 1; ftt 4- IEEE_Exception; 
) else ( (no floating-point trap) 

aexc 4— aexc or cexc; 
if (FQ_single_result « 1) then 

f[rd] 4- result; 
if (FQ__double_result « 1) then 
f[rdE], ffrdO] ♦- result; 
if (FQ_extended__result « 1) then 

ffrdEE], ffrdEO], flrdOE] 4- result; 
if (FQ__corr.pare = 1) then 

fee 4- tfcc; 
dequeue_FQ; 
) 
) 

next; 

if (fp_exception = 1) then ( 
FPU_execute_mode ♦- 0; 
FPU_exception_pending__mode 4- 1 
) 
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C^. Coprocessor Instruction Execution 

The CP can execute coprocessor operate (CPop) instructions concurrently with integer Instmc- 
tlons and other CPops. Although the instruction set includes a "store CP double queue" instruc- 
tion, the existence of the queue and the type of concurrency available in the coprocessor is 
dependent on the coprocessor itself. 

The FPU leaves FPU_exception_mode and enters FPU_execute_niode after the FQ has been 
emptied (via execution of STDFQ instructions.) 

execute_CP_instruction :« ( {not specified) ) ; 

0.9. Traps 
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execute_trap :« ( 
select_trap; 

ET ♦- 0; (ignore asynchronous traps} 

PS 4- S; 
annul 4-0; 

CWP 4- (CWP - 1) mod NWINDOWS; {point to next window} 
r[17] 4- PC; (preserve program counters} 

rI18] 4- nPC; 
next ; 

S 4— 1; (set supervisor mode} 

if (reset_trap - 0) then { 

PC «- TBR; 

nPC 4- TBR + A 
) else ( 

reset_trap 4—0; 

PC 4- 0; 

nPC 4- A 
) 

); 

select_trap :« ( 

if (ET = or reset_trap = 1) then 

error_mode 4- 1 
else if (instruction_access_exception = 1) then 

tt 4- OOOOOOOI2 
else if (illegal_instruction = 1) then 

tt 4- 00000010^ 
else if (privileged__inst. ruction = 1) then 

tt 4- 00000011^ 
else if (fp_disabled = 1) then 

tt 4- 00000100^ 
else if (cp_disabled = 1) then 

tt 4- 00100100^ 
else if (window_overf low = 1) then 

tt 4- OOOOOIOI2 
else if (window_underflow = 1) then 

tt 4- OOOOOllOj 
else if (mem_address__not_aligned = 1) then 

tt 4- OOOOOIII2 
else if (fp__exception «= 1) then 

tt 4- 00001000^; 
else if {cp_exception = 1) then 

tt 4- ooioiooo^; 

else if {data__access_except ion *= 1) then 

tt 4- OOOOIOOI2 
else if {tag_overflow = 1) then 

tt 4- OOOOIOIO2 
else if (trap_instruction = 1) then 

tt 4— l2Qticc__trap_type 
else if (interrupt_level > 0) then 

tt 4- OOOljQii^terrupt^level 
next; 

trap 4— 0; (since the tt field has been set, reset the trap signal} 
reset_trap 4-0; 
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Instruct ion_accea5_except ion ^- 0; 
illegal^instruction <— 0; 
privileged^inst ruction ♦- 0; 
fp_disabled 4-0; 
cp_disabled 4-0; 
window_overflow 4-0; 
window__underf low 4-0; 
incm_address_not_aligned 4-0; 
fp_cxception 4—0; 
cp_exception 4-0; 
data_access_cxception 4—0; 
tag__overflow 4-0; 
trap_instruction 4—0; 
interrupt_level 4- 



); 



check_interrupts := ( 

if (bp__reset_in = 1) then ( 

reset_mode 4— 1 
) else if (ET = 1 and (bp__IRL - 15 or bp_IRL > PIL) ) then ( 
trap 4- 1; 

interrupt_level ♦- bp_IRL 
); 
); 



C-15 ISP Descriptions C-15 



Solbourne Computer, Inc. 



CIO. Instruction Definitions 



This section contains the ISP definitions of the SPARC architecture instmctions. These comple- 
ment the instruction descriptions \n Appendix B, Instruction Descriptions, 



C.I 0.1. Load Instructions 
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if ( (LDF or LDDF or LDFSR) then ( 

if (EF «= or bp_FPU_present - 0) then ( 

trap 4- 1; fp_disabled 4— 1 
) else if (FPU_exception_mode « 1) then { 
ftt 4- sequence_error; 
FPU_exception_mode <— ; 
FPU_exception_pending__mode «— 1 ; 
) ; 

if ( (LDC or LDDC or LDCSR) and (EC « or bp_CP_present -= 0) ) then ( 

trap +-1; cp_disabled 4- 1 ) ; 
next ; 
if (trap «= 0) then ( 

if (LDD or LD or LDSK or LDUH or LDSB or LDUB 

or LDDF or LDF or LDFSR or LDDC or LDC or LDCSR) then ( 
address 4- r(rsl] -♦■ (if i^O then r[rs2] else 5ign__extend (simml3) ) ; 
addr_space 4- (if (S ^ 0) then 10 else 11) 
) else if (LDDA or LDA or LDSKA or LDUHA or LDSBA or LDUBA) then ( 
if (S - 0) then ( 

trap 4- 1; privileged__instruction 4- 1 
) 

address 4- r[rsl] + r[rs2]; 
addr__space 4- asi 
); 
); 

next; 
if (trap = 0) then ( 

if ( ((LDD or LDDA or LDDF or LDDC) and addre5S<2:0> * 0) or 

( (LD or LDA or LDF or LDFSR or LDC or LDCSR) and address<l:0> # 0) or 
((LDSH or LDSHA or LDUH or LDUHA) and address<0> * 0) ) then ( 
trap 4— 1; mem_addr_not__aligned 4— 1 
) 
); 

next; 
if (trap = 0) then ( 

data 4— inemory_read (addr_space, address); 

MAE 4— bp_memory__exception; 

next ; 

if (MAE = 1) then ( 

trap 4- 1; data_access_exception 4— 1 
) else ( 

if (LDSB or LDSBA or LDUB or LDUBA) then ( 

if (address<l:C> = 0) byte 4- data<31:24> 
else if (address<l:0> = 1) byte <- data<23:l€> 
else if (address<l :0> « 2) byte 4- data<15:8> 
else if (address<l:C> = 3) byte 4- data<7:0>; 
next ; 
if (LDSB or LDSBA) then 

wordO 4— sign__extend_byte (byte) 
else 

wordO 4— zero_extend_byte (byte) 
) else if (LDSH or LDSHA or LDUH or LDUHA) then ( 

if (address<l:0> «= 0) halfword 4- data<31:16> 
else if (address<l:0> « 2) halfword 4- data<15:0>; 
next ; 
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if (LDSH or LDSHA) then 

wordO 4- sign_extend_halfworGi( half word) 
else 

wordO <- 2ero_extend_halfword(halfword) 
) else 

wordO 4— data 
) 

); 

next ; 

if (trap = 0) then ( 

if ( rd * and (LD or LDA or LDSH or or LDSHA or LDUHA or LDUH or LDSB or LDSBA or I 

r[rd] 4- wordO 
else if ( ((rd and IIIIO2) * 0) and <LDD or LDDA) ) then 

r[rd and IIIIO2] <- wordO 
else if (LDF) then 

f [rd] <- wordO 
else if (LDFSR) then ( 

wait_f or__FAUs_to_complete; { implementation-defined) 

FSR 4- wordO ) 
else if (LDC) then 

c{rd] <- wordO 
else if (LDCSR) then 

CSR 4- wordO 

); 

next; 

if (trap « and (LDD or LDDA or LDDF or LDDC) ) then ( 
wordl 4- memory_read (addr_space, address +4); 
MAE 4— bp_memory_exception; 
next; 
if (MAE = 1) then ( 

trap 4—1; data_access_exception 4— 1 ) 
else if (LDD or LDDA) then 

r[rd or 1] 4- wordl 
else if (LDDF) then 

f [rd or 1] 4- wordl 
else if (LDDC) then 

c[rd or 1] 4— wordl 

); 
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if ((STF or STDF or STFSR or STDFQ) and (EF « or bp_FPU_present « 0) ) then ( 

trap 4-1; fp__disabled ^— 1 ) ; 
if ( (STC or STDC or STCSR or STDCQ) and (EC = or bp_CP_present « 0) ) then ( 

trap 4-1; cp_disabled 4— 1 ) ; 
if (trap = 0) then ( 

if (STD or ST or STH or STB or STF or STDF or STFSR or STDFQ or STCSR or STC or STDC o 
address 4- rlrsl] + (if i=0 then r[rs2] else sign__extend(sinmil3> ) ; 
addr_space 4- (if S=0 then 10 else 11) 
) else if (STDA or STA or STHA or STBA) then ( 
if (S = 0) then ( 

trap 4— 1; privileged__instruction 4- 1 
) else ( 

address 4- rfrsl] -^ r[rs2]; 
addr_space *— asi; 
) 
); 
); 

next ; 

if (trap = 0) then ( 

if (STD or STDA or STDF or STDFQ or STDC or STDCQ) then ( 
if (address<2:0> 9t 0) then 

trap 4—1; mem_addr_not__aligned 4— 1 ) 
else if (ST or STA or STF or STFSR or STC or STCSR) then ( 
if (address<l:0> ^ 0) then 

trap ♦- 1; mem__addr_not_aligned 4- 1 ) 
else if (STH or STHA) then ( 

if (address<0> * 0) then ( 

trap 4—1; rr.em_adGr_nct aligned 4— 1 ) 

); 
); 

next; 

if (trap = 0) then ( 
if (STDF) then ( 

byte_mask 4- IIII2; dataC 4- f [rd and lllOj] ) 
else if (STDFQ) then ( 

byte_mask 4- llllj; dataO 4- FQ.ADDR ) 
else if (STDC) then ( 

byte_mask 4- llll^; dataO 4- c[rd and lllO^] ) 
else if (STDCQ) then ( 

byte__mask 4- llll^; dataC 4- CQ.ADDR ) 
else if (STD or STDA) then ( 

byte__mask 4- llli^; cataO 4- r[rd and lllOj] ) 
else if (ST or STA) then ) 

byte_mask « llllj; dataO = r[rdj) 
else if (STH or STHA) then ( 

if (address<l:0> = 0) then ( 

byte_mask 4- IIOO2; dateO 4- shift_left__logical (r [rd] , 16) ) 
else if (address<l:0> = 2) -hen ( 

byte__mask 4- OOllj; dataO 4- r[rd] ) ) 
else if (STB or STBA) then ( 

if (address<l:C> = 0) then ( 

byte__mask 4- lOOOj; dataC 4- shift_left__logical (r {rd] , 24) ) 
else if (address<l:C> = 1) -hen ( 

byte_mask 4- 0100^; dataO 4- shift__left_logical (r [rd] , 16) ) 
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else if {dddress<l:0> = 2) then ( 

byte_mask 4- OOlOj; dataO 4- shift_left_logical (r [rd] , 8) ) 
else if (address<l:0> « 3) "hen ( 

byte_mask ♦- OOOlj; dataO 4- r[rd] ) 
); 
); 

next ; 
if (trap « 0) then ( 

inemory_write (addr__space, address, byte_mask, dataO); 

MAE 4— bp_inemory_exception 

next ; 

if (MAE « 1) then ( 

trap <— 1; data_access_exception 4- 1 
) 
); 

next; 
if (trap = 0) then ( 

if (STD or STDA) then datal ♦- r[rd or 1] 
else if (5TDF) then daLal «- f[rd or 1] 
else if (STDFQ) then ( 

datal <- FQ.IKSTR; 
dequeue_FQ; 
next ; 
if (qne = 0) then ( 

FPU_exception__mcGe ♦- ; 
FPU_execute_mode <— 1 
) 
) 
else if (STDC) then datal <- c{rd or 1] 
else if (STDCQ) then datal <- CQ.INSTR 
next; 

memory__write <addr__space, addresr, -^ 4, llllj, datal); 
MAE 4- bp_ineinory_exception; 
next; 
if (MAE = 1) then ( 

trap 4— 1; data_access_except ion <— 1 
) 
); 
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CI 0.3. Atomic Load-Store Unsigned Byte instructions 

if (LDSTUB) then ( 

address 4- r(rsl] + (if i*0 then r[rs2] else sign_extend(simml3) ) ; 
addr_space <- (if (S « 0) then 10 else 11) 
) else if (LDSTUBA) then ( 
if (S « 0) then ( 

trap ♦- 1; privileged__instruction 4- 1 
) 

address <- r[rsl] + r(rs2]; 
addr_space <- asi 
); 

next ; 
if (trap = 0) then ( 

pb_^retain__bus ♦- 1; 

next; 

data 4— inemory__read (addr__space, address); 

MAE ♦- bp_inemory_exception; 

next ; 

if (MAE = 1) then ( 

trap <— 1; data_access_exception <— 1 
) else ( 

if (address<l:C> = 0) word «- zero_extend_byte (data<31:24>) 
else if (address<l:0> «= 1) word 4- zero__extend_byte (data<23 :16>) 
else if (address<l:0> = 2) word 4~ 2ero_extend_byte (data<15 :8>) 
else if (address<l:0> = 3) word «- 2ero__extend_byte (data<7:0>) ; 
next; 

if (rd * 0) then rfrd] «- word 
) 
); 

next; 
if (trap = 0) then ( 

if (address<l:0> - 0) then ( byte_mask 4- IOOO2) 

else if (address<l:0> = 1) then ( byte__mask «- OlOOj) 

else if (address<l:0> = 2) thon ( byte_mask 4- 0010^) 

else if (address<l:0> = 3) thon ( byte_mask 4- OOOlj) 

» 

next ; 

memory_write (addr__space, address, byte_mask, FFFFFFFF^g) ; 
MAE 4— bp_memory_exception; 
next; 

pb_retain_bus 4-0; 
if (MAE = 1) then ( 

trap 4— 1; data_access_exception 4— 1 
) 

); 
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C.I 0.4. Swap r Register with Memory instructions 

if (SWAP) then ( 

address f- r[rsl] + (if i«0 then r[rs2] else sign_extend (simmlS) ) ; 

addr_space ♦- (if (S « 0) then 10 else 11) 
) else if (SWAPA) then ( 

if (S = 0) then ( 

trap f- 1; privileged_instruction 4- 1 

) 

address «- r[rsl] + rfrs2]; 

addr_space <— asi 
); 

next ; 

if (trap = 0) then { 

temp 4— r [rd] ; 

pb_retain_bus 4-1; 

next; 

word 4— memory^read (addr_space, address); 

MAE 4— bp_memory_exception; 

next ; 

if (MAE «= 1) then ( 

trap 4- 1; data_access__except ion 4— 1 

) else { 

if (rd ;^ 0) then r(rd] <- wore 

) 
); 

next; 
if (trap = 0) then ( 

memory^write (addr_space, address, IIII2, temp); 

MAE 4— bp_memory_exception; 

next ; 

pb__retain__bus 4—0; 

if (MAE « 1) then ( 

trap 4-1; data_access_excepti on 4— 1 

) 
); 
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C.I 0.5. Add Instructions 

operand2 :« if 1*0 then r[rs2] else sign_extend{simiT\13) ; 

if (ADD or ADDcc) then 

result ♦- r[rsl] + operand2; 
else if (ADDX or ADDXcc) then 

result ♦- rfrsl] + operand2 + C; 
next; 
if (rd ^ 0) then 

r[rd] 4- result; 
if (ADDcc or ADDXcc) then ( 

N 4- result<31>; 

Z ♦- if result=0 then 1 else 0; 

V 4- (r[rsl]<31> and operand2<31> and not result<31>) or 

(not rlrsl]<31> and not operand2<31> and result<31>) 
C ^ (r[rsl]<31> and operand2<31>) or 

(not result<31> and (r[rsl]<31> or operand2<31>) ) 

); 
C.I 0.6. Tagged Add Instructions 

operand2 := if i=0 then r(rs2] else sign__extend{simml3) ; 

result <— r[rsl] + operand2; 
next; 

temp^v <- (r[rsl]<31> and operand2<31> and not result<31>) or 

(not r[rsl]<31> and not operand2<31> and result<31>) or 

(r[rsl]<l:0> ?i or operand2<l : 0> ^0); 
next; 
if (TADDccTV and temp_v = 1) then ( 

trap *- 1; tag_overflow ♦- 1 
) else ( 

N <- result<31>; 

Z 4- if result=0 then 1 else 0; 

V <— temp_v; 

C <- (r[rsl]<31> and opGrand2<21>) or 

(not result<31> ar,6 (rirsl)<31> or operand2<31>) ) ; 
if (rd ?ft 0) then 

r[rd] 4- result; 
); 
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C.I 0.7. Subtract Instructions 

operand2 :« if i«0 then r(rs2] else sign^extend (simml3) ; 

if (SUB or SUBcc) then 

result <- rfrsl] - operand2; 
else if (SUBX or SUBXcc) then 

result 4— r[rsl] - operand2 - C; 
next ; 
if (rd 1^ 0) then 

r[rd] <— result; 
if (SUBcc or SUBXcc) then ( 

N ♦- result<31>; 

2 4- if result=0 then 1 else 0; 

V 4- (r[rsl]<31> and not operand2<31> and not result<31>) or 

(not r[rsl]<31> and operand2<31> and result<31>) ; 
C ♦- (not rlrsl]<31> and operar.d2<31>) or 

(result<31> and (not r[rsl]<31> or operand2<31>) ) 
); 

C.10.8. Tagged Subtract Instructions 

operand2 := if i=0 then r{rs2] else sign__extend (simml3) ; 

result 4— r[rsl] - operand2; 

next ; 

temp_v 4— (r{rsl]<31> and not operand2<31> and not result<31>) or 

(not rlrsl]<31> and operand2<31> and result<31>) or 

(r[rsl]<l:0> ^ or operand2<l :0> ^ 0) ; 
next; 
if (TSUBccTV and temp_v = 1) then ( 

trap 4—1; tag__overf low <— 1 
) else ( 

N 4- result<31>; 

Z 4- if result=0 then 1 else 0; 

V 4— teinp_v; 

C ♦- (not r(rsl]<31> and operand2<31>) or 

(result<31> and (not rlrsl]<31> or operand2<31>) ) ; 
if (rd ^ 0) then 

r[rd] 4— result; 
); 
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C.I 0.9. Multiply Step Instruction 

operandi :« (N xor V)Or lrsl]<31:l>; 
operand2 : « ( 

if {Y<0> = 0) then 

else if (i = 0) then r(rs2] else sign_extend (simml3) 

); 

result 4- operandi + operand2; 

Y ♦- r[rsl]<0>QY<31:l>; 
next; 

if (rd * 0) then 

r[rd] «- result; 
N ♦- result<31>; 
Z 4~ if result=0 then 1 else 0; 

V 4- (operandl<31> and operand2<31> and not result<31>) or 

(not operandl<31> and not operand2<31> and result<31>); 
C 4- (operandl<31> and operand2<31>) or 

(not result<31> and (operandl<31> or operand2<31>) ) 

C.10.10. Logical Instmctions 

operand2 := if i=0 then r[rs2] else sign^extend (simml3) ; 

if (AND or ANDcc) then result <- r[rsij and operand2 

else if (ANDN or ANDNcc) then result. ♦- r(rsl] and not operand2 

else if (OR or ORcc) then resuJt <— r'rsl] or operand2 

else if (ORN or ORNcc) then result ♦- r[rsl] or not operand2 

else if (XOR or XORcc) then result <— rirsl] xor operand2 

else if (XNOR or XNORcc) then result. <- r[rsl] xor not operand2; 

next ; 

if (rd * 0) then r[rd] ♦- result; 

if (ANDcc or ANDNcc or ORcc or ORNcc or XORcc or XNORcc) then ( 

N <- result<31>; 

Z 4- if result=0 then 1 else 0; 

V f- 0; 

C <- 
); 
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C.10.11. Shift Instructions 

shift_count :■= if i=0 then r[rs2]<4:0> else shcnt; 

if (SLL and rd ?t 0) then 

r[rd] 4- shift_left_logical (r [rsl] , shift_count) 
else if (SRL and rd ^ 0) then 

r [rd] <- shif t_right_logical (r [rsl] , shif t_count) 
else if (SRA and rd ^ 0) then 

r [rd] <- shif t_right_arithmetic (r [rsl] , shif t_count) 

C.I 0.1 2. SETHI Instruction 

if (rd ^ 0) then ( 

r[rd]<31:10> <- imrri22; 
r[rd]<9:0> <- 

) 

C.10.13. SAVE and RESTORE Instructions 

operand2 :« if i=0 then r[rs2] else sign_extend(siiranl3) ; 

if (SAVE) then ( 

new_cwp <~ (CWP " 1) mod NWINDOWS; 

next; 

if ((WIM and 2^^^-^^P) it: O) then ( 

trap ^- 1; window_overf low <— 1 
) else ( 

result <— r[rsl] + operand2; {operands from old window} 
CWP <- new__cwp 
) 
) else if (RESTORE) then ( 

new_cwp <- (CWP + 1) mod NWINDOWS; 

next ; 

if ((WIM and 2^^^-^^P) ^ 0) then ( 

trap *- 1; window_underf low <— 1 
) else ( 

result <— r[rsl] + operand2; {operands from old window} 
CWP <— new_cwp 
) 
); 

next ; 
if (trap = and rd ^t 0) then 

r[rd] <- result {destination in new window} 
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C.10.14. Branch on Integer Condition Instructions 

eval_icc :« ( 

if (BNE and (Z » 0) ) then 1 else 0; 

if (BE and (Z « 1)) then 1 else 0; 

if (EG and ( (Z or (N xor V)) ^0)) then 1 else 0; 

if (BLE and ( (Z or (N xor V)) « 1) ) then 1 else 0; 

if (BGE and ( (N xor V) « 0) ) then 1 else 0; 

if (BL and ( (N xor V) = 1)) then 1 else 0; 

if (BGU and (C « and Z « 0) ) then 1 else 0; 

if (BLEU and (C « 1 or Z -= 1) ) then 1 else 0; 

if (BCC and (C = 0)) then 1 else 0; 

if (BCS and (C = 1)) then 1 else 0; 

if (BPOS and (N = 0)) then 1 else 0; 

if (BNEG and (N = 1)) then 1 else 0; 

if (BVC and (V = 0) ) then 1 else 0; 

if (BVS and (V = 1)) then 1 else 0; 



if (BA) then 1; 
if (BN) then 



); 



PC <- nPC; 

if (eval_icc) = 1 then ( 

nPC f- PC + sign_extend(disp22[]002) ; 
if (BA and a = 1) then 
annul 4- 1 
) else ( 

nPC f- nPC + 4/ 
if (a « 1) then 
annul <— 1 
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C.10.15. Floating-Point Branch on Condition Instaictions 

E := if fcc=0 then 1 else 

L :«= if fcc«=l then 1 else 

G :"^ if fcc«=2 then 1 else 

U :«= if fcc=3 then 1 else 

eval_fcc :«= ( 

if (FBU and U) ) then 1 else 0; 

if (FBG and G) then 1 else 0; 

if (FBUG and (G or U) then 1 else 0; 

if (PEL and L) then 1 else 0; 

if (FBUL and (L 

if (FBLG and (L 

if (FBNE and (L or G or U) ) then 1 else 0; 

if (FBE and E) ) 

if (FBUE and (E 

if (FBGE and (E 

if (FBUGE and (E or G or U) ) then 1 else 0; 

if (FBLE and (E or L)) then 1 else 0; 

if (FBULE and (E or L or U) ) then 1 else 0; 

if (FBO and (E or L or G) ) then 1 else 0; 

if (FBA) then 1; 

if (FBN) then 
); 

PC <- nPC; 

if (eval_fcc «= 1) then ( 

nPC <- PC + sign_extend(disp22[] OOj) ; 

if (FBA and (a = 1) ) then 
annul <— 1 
) else ( 

nPC f- nPC + 4; 

if (a = 1) then 
annul <- 1 
) 



or U) ) then 


1 else 0; 


or G) ) then 


1 else 0; 


or G or U) ) 


then 1 else 


then 1 else 


0; 


or U) ) then 


1 else 0; 


or G) ) then 


1 else 0; 


or G or U) ) 


then 1 else 
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C.10.16. Coprocessor Branch on Condition Instructions 

CO :« if bp_CP_cc<l:0>«=0 then 1 else 0; 

CI :« if bp_CP_cc<l:0>«l then 1 else 0; 

C2 :« if bp_CP_cc<l:0>==2 then 1 else 0; 

C3 :« if bp_CP_cc<l:0>=3 then 1 else 0; 

eval_bp__CP_cc :« ( 

if (CB3 and C3) ) then 1 else 0; 

if (CB2 and C2) then 1 else 0; 

if (CB23 and (C2 or C3) then 1 else 0; 

if <CB1 and CI) then 1 else 0; 

if (CB13 and (CI or C3) ) then 1 else 0; 

if (CB12 and (CI or C2) ) then 1 else 0; 

if (CB123 and (Cl or C2 or C3) ) then 1 else 0; 

if (CBO and CO)) then 1 else 0; 

if (CB03 and (CO or C3) ) then 1 else 0; 

if (CB02 and (CO or C2) ) then 1 else 0; 

if (CB023 and (CO or C2 or C3) ) then 1 else 0; 

if (CBOl and (CO or CD) then 1 else 0; 

if (CB013 and (CO or Cl or C3) ) then 1 else 0; 

if (CB012 and (CO or Cl or C2) ) then 1 else 0; 

if (CBA) then 1; 

if (CBN) then 

); 

PC 4- nPC; 

if (eval_bp__CP_cc = 1) then ( 

nPC 4- PC "^ sign__extend (disp22[]00j) ; 

if (CBA and (a == 1) ) then 
annul 4- 1 
) else ( 

nPC ♦- nPC + 4; 

if (a « 1) then 
annul <- 1 
) 

C.I 0.1 7. CALL Instruction 

r[15] ♦- PC- 
PC ♦- nPC; 
nPC 4- PC + disp30Q00^ 
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C.10.18. Jump and Link Instruction 

jump_address 4- r(rsl] + (if i=0 then r[rs2] else sign__ext (siininlS) ) ; 

next ; 

if (jump_address<l:0> vt 0) then ( 
trap ♦-I; 
mem_address_not_aligned ♦- 1 

) else ( 

if (rd gtO) then r(rd] ♦- PC- 
PC 4- nPC; 
nPC ♦- jump__addr€ss 



C.10.19. Return from Trap Instruction 

new_cwp 4- (CWP + 1) mod NWINDOWS; 

address 4~ r[rsl] + (if i«=0 then r[rs2] else sign_extend (simml3) ) ; 

next; 

if (ET) then ( 

trap 4- 1; 

illegal__instruction 4- 1 
) else if (S = 0) then ( 

trap 4—1; 

privileged_instructicn 4— 1 
) else if ((WIM and 2"^*^-^^P) ^ 0) then ( 

trap 4- 1; 

window__underf low <— 1 
) else if (address<l:0> ^ 0) then ( 

trap 4-1; 

mem__address_not._ali'5ned 4— 1 
) else ( 

ET 4- 1; 

PC 4- nPC; 

nPC 4— address; 

CWP 4- new__cwp; 

S 4- pS 
) 
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C.I 0.20. Trap on Integer Condition instructions 

trap_eval_icc :« ( 

if (TNE and (Z = 0)) then 1 else 0; 

if (TE and (Z = D) then 1 else 0; 

if (TG and ( (Z or (N xor V)) =0)) then 1 else 0; 

if (TLB and ( (Z or (N xor V)) =1)) then 1 else 0; 

if (TGE and ( (N xor V) « 0) ) then 1 else 0; 

if (TL and < (N xor V) « 1)) then 1 else 0; • 

if (TGU and (C = and Z = 0) ) then 1 else 0; 

if (TLEU and (C = 1 or Z « 1)) then 1 else 0; 

if (TCC and (C = 0)) then 1 else 0; 

if (TCS and (C = 1) ) then 1 else 0; 

if (TPOS and <N = 0) ) then 1 elso 0; 

if (TNEG and (N = 1) ) then 1 else 0; 

if (TVC and (V = 0)) then 1 else 0; 

if (TVS and (V - 1)) then 1 else 0; 

if (TA) then 1; 

if (TN) then 
); 

trap_number := r[rsl] + (if i=0 then r[rs2] else sign_ext (simml3) ) ; 

if (Ticc) then ( 

if (trap_eval_icc = 1) then ( 
trap 4-1; 

trap_instruction ♦— l; 
ticc_trap_type 4— trap_number <6:0> 

) else ( 

PC «- nPC; 
nPC 4- nPC + 4 

) 
); 
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C.I 0.21. Read State Register Instructions 

If ((RDPSR or RDWIM or RDTBR) and S «= 0) then ( 
trap 4-1; 

privileged__inst ruction 4- 1 
) else if (rd ^ 0) then ( 
if (RDY) then 

r[rd] 4- Y 
else if (RDPSR) then 

r(rd] 4- PSR 
else if (RDWIM) then 

rird] 4- WIM 
else if (RDTBR) then 
r(rd] 4- TBR; 
); 

C.I 0.22. Write State Register Instructions 

operand2 := if i=0 then r[rs2j else sign_extend (simml3) < 
result := r(rsl] xor operand2; 

if (WRY) then 

Y' 4- result 
else if (WRPSR) then ( 

if (result<4:0> ^ NWINDOKS) then ( 

trap 4-1; 

illegal_instruction <~ 1 
) else if (S = 0) then ( 

trap 4-1; 

privileged_instri;cticn ♦— i 
) else 

PSR' <- result 
) else if (WRWIM) then ( 
if (S •= 0) then ( 

trap 4- 1; 

privilegcd_instruction <— 1 
) else 

WIM' 4- result 
) else if (WRTBR) then 
if (S = 0) then ( 

trap 4- 1; 

privileged_instruction <-~ 1 
) else 

TBR ' 4- result 

); 
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C.I 0.23. Unimplemented instruction 

trap 4-1; 

illegal^inst ruction <— 1 

C.I 0.24. Instruction Cache Flush Instruction 

address :«* r[rsl] + (if i=0 then r[rs2] else sign_extend(siinml3) ) ; 

if (IU_cache_present) then 

flush_IU_cache_word (address ) { implementation-dependent } 
else if (bp_I_cache_present) then ( 

trap <- 1; 

illegal_instruction <— 1 
) 
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C.11. Floating-Point Operate Instructions 

The multiple precision FPops use the following notation to indicate frep/sfer alignment: 

double precision 

rslE :«= rsl<4:l>002; i^slO :« r.sl<4:l>Ql2; 

rs2E :- rs2<4:l>Q02; rs20 :- r:;2<4 : l>Ql2; 

rdE :- rd<4:l>Q02; rdO := rd<4:l>Dl2 

extended precision 

rslEE :« rsl<4:2>Q002; rslEO := rsl<4:2>Q0l2; rslOE :«= rsl<4:2>Ql02; 
rs2EE :- rs2<4 :2>Q002; rs2E0 := rs2<4 :2>[]0l2; rs20E := rs2<4 :2>Dl02; 
rdEE :«= rd<4:2>Q002; rdEO := rd<4:2>Q0l2; rdOE :-= rd<4:2>Ql02 

Most of the floating-point routines defined below (or not defined since they are inrplementation- 
dependent) return: (1) a single, double, or extended result (2) a 5-bit exception vector {texc) 
similar to the cexc field of the FSR, or a 2-bit condition code vector (tfcc) identical to the fcc field 
of the FSR; and (3) a completion status bit (c) which indicates whether the arithmetic unit was 
able to complete the operation. 



C.11.1. Convert Integer to Floating-Point Instructions 

if (FiTOs) then 

result, texc, c <— cvt_integer_to_single (f [rs2] ) 
else if (FiTOd) then 

result, texc, c <- cvt_integer_to_double (f [rs2] ) 
else if (FiTOx) then 

result, texc, c 4- cvt_integer__to_extended(f [rs2] ) 

C.11 .2. Convert Floating-Point to Integer 

if (FsTOi) then 

result, texc, c <— cvt_single_to__integer (f [rs2] ) 
else if (FdTOi) then 

result, texc, c <- cvi_Gouble_to__integer (f [rs2E]Qf [rs20] ) 
else if (FxTOi) then 

result, texc, c ♦- cvt_extended_to_integer (f [rs2EE][]f [rs2E0][]f [rs20E] ) , 
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C.1 1.3. Convert Between Floating-Point Fonnats Instructions 

if (FsTOd) then 

result, texc, c <- cvt_single___to_double(f [rs2] ) 

else If (FsTOx) then 

result, texc, c 4— cvt_single_to_extended(f (rs23 ) 

else if (FdTOs) then 

result, texc, c ♦- cvt__double_to__single (f [rs2E][]f [rs20] ) 

else if (FdTOx) then 

result, texc, c 4- cvt_double_to_extended(f (rs2E][]f [rs20] ) 

else if (FxTOs) then 

result, texc, c «- cvt_extended_to_single (f Irs2E]Pf Irs20][]f [rs20E] ) 

else if (FxTOd) then 

result, texc, c «- cvt_extended__to_double (f lrs2EE][]f [rs2E0][]f [rs20E] ) 

C.11.4. Floating-Point IMove instructions 

if (FMOVs) then 

result <- f[rs2] 
else if (FNEGs) then 

result 4- f[rs2] xor SOOOOOOOjg 
else if (FABSs) then 

result 4- f [rs2] and TFFFFFFFig; 
texc 4-0; 
C ♦- 1 

C.1 1.5. Floating-Point Square Root Instructions 

if (FSQRTs) then 

result, texc, c 4- sqrt__sinGle (f [rs2] ) 
else if (FSQRTd) then 

result, texc, c 4- sqrt_double (f [rs2E][]f [rs203 ) 
else if (FSQRTx) then 

result, texc, c 4- sqrt_extended (f [rs2EE]Qf [rs2E0]Qf [rs20E] ) 
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C.1 1.6. Floating-Point Add and Subtract instructions 

if (FADDs) then 

result, texc, c <- add_single (f (rsl] , f[rs2]) 
else if (FSUBs) then 

result, texc, c ♦- sub_single (f (rsl] , f[rs2]) 
else if (FADDd) then 

result, texc, c <- add^double (f (rslEjOf [rslO] , f [rs2E]Qf [r»20] ) 
else if (FSUBd) then 

result, texc, c <- sub_double (f (rslE] Qf [rslO] , f [rs2E][]f [rs20] ) 
else if (FADDx) then 

result, texc, c ♦- add^exter.ded (f [rslEElQf [rslEO][]f [rslOE] , 
f [rs2EE]Qf :rs2E0]Df [rs20E]) 
else if (FSUBx) then 

result, texc, c ♦- sub_extendec (f [rslEEjQf [rslEOjQf [rslOE] , 
f [rs2EE]Qf [rs2E0]Qf [rs2CE]) 

C.1 1.7. Fioating-Point Muitlply and Divide instructions 

if (FMULs) then 

result, texc, c <— mul_single (f [rsl] , f[rs2]) 
else if (FDIVs) then 

result, texc, c <- div_single (f [rsl] , f(rs2]) 
else if (FMULd) then 

result, texc, c <- mul__double (f (rslE]Qf (rslO] , f [rs2E]Qf [rs20] ) 
else if (FDIVd) then 

result, texc, c ♦- div_double ( f [rslE] Qf [rslO] , f [rs2E3Qf [rs203 ) 
else if (FMULx) then 

result, texc, c <- mui_extenaeG (f (rslEE] Qf [rslEOjQf [rslOE] , 
f [rs2E£]nf [rs2EC]Df !rs20E3) 
else if (FDIVx) then 

result, texc, c <- div_extended (f [rslEElQf (rslEO][]f [rslOE] , 
f (rs2EE]Qf [rs2E0]Qf lrs20E] ) 

C.1 1.8. Fioating-Point Compare instructions 

if (FCMPs) then 

tfcc, texc, c 4~ compare_sirgle (f (rsl] , f[rs2]) 
else if (FCMPd) then 

tfcc, texc, c «- conpare_double(f [rslE]Qf [rslO], f [rs2E]Df [rs20] ) 
else if (FCMPx) then 

tfcc, texc, c 4- compare__extended (f (rslEE][3f [rslEO][]f [rslOE] , 
f [rs2EE]Qf [rs2E03Qf [rs20E]) 
else if (FCMPEs) then 

tfcc, texc, c <— corripare_e_E ingle (f [rsl] , f[rs2]); 
else if (FCMPEd) then 

tfcc, texc, c <- cc!r.pare__e_ccjble(f [rslE][]f [rslO], f [rs2E][]f [rs20] ) 
else if (FCMPEx) then 

tfcc, texc, c «- cor.pare_e__extended (f [rslEE][]f [rslEO][]f [rslOE] , 
f lrs2EE]0f lrs2E0]Qf [rs20E] ) 



0-37 ISP Descriptions 0-37 



APPENDIX D: SOFTWARE CONSIDERATIONS 



D.I. Introduction 

This appendix describes how software can use the SPARC architecture effectively. It describes 
assumptions that compilers may make about the resources available, and how compilers can use 
them. It does not discuss how the operating system may use the architecture. 

How to use registers is typically a very important resource allocation problem for compilers. The 
SPARC architecture provides windowed registers {in, out, locafj, global registers, and floating- 
point registers. 

D.1.1. /A7 and Our Registers 

The in and out registers are used primarily for passing parameters to subroutines and receiving 
results from them, and for keeping track of the memory stack. When a routine Is called, the . 
caller's outs become the callee's ins. 

One of the caller's out registers is used as the stack pointer, SP. It points to an area in which the 
system can store rre through r31 when the register file overflows. It is essential that this regis- 
ter have the correct value when the corresponding underflow trap occurs so that the 
register window can be reloaded. It is also important that this register be kept up to date with 
register window changes, and that the overhead for doing calls be kept as small as possible. 
Since SP is in one of the caller's out registers, it can be used by the callee as Its FP, and the cal- 
lee can use the SAVE instruction to set its own SP from its FP. 

Up to six parameters* may be passed by placing them in the out registers; additional parameters 
are passed in the memory stack. When the callee is entered, the parameters passed in registers 
are now in its corresponding ins. One of the other two in/out registers is used as the caller's old 
SP. which is also the current routine's frame pointer, FP (see below). The other is used to pass 
the subroutine's return address. With the exception of SP, out registers may be used as tem- 
poraries between subroutine calls. 

If a routine is passed more than six parameters, the remainder are passed on the memory stack. 
If, on the other hand, it is passed fewer than six parameters. It may use the other parameter 
registers as If they were locals. If a register parameter has its address taken, it must be stored 
on the memory stack, and used from there for the lifetime of the pointer (or for the extent of the 
procedure, if the compiler cannot figure this out). A function returns its value by writing it into its 
ins (which are the caller's outs). 



D.1.2. Local Registers 

The locals are used for automatic variables and most temporaries. The compiler may also copy 
parameters out of the memory stack into the locals and use them from there. If an automatic 
variable has its address taken, it must be stored in the memory stack for the lifetime of the 



t Six is more than adequate, since the ovenvheiming majority of procedures In system code — at least 97% 
measured staticaily. according to the studies cited by Weicker (Weicker, R.P., Dhrystone: A Synthetic Systems 
Programming Benchmark, CACM 27:10. October 1984) — take fewer than six parameters. The average number of 
parameters, measured statically or dynamically, is no greater than 2.1 in any of these studies. 
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pointer (or for the extent of the procedure, if the compiler cannot figure this out). 

D.I .3. Global Registers 

Unlike the /ns, locals, and oate, the globa/s are not part of any register window, but are a single 
set of registers with global scope, like the registers of a nfX)re traditional architecture. This means 
that if they are used on a per-procedure basis, they must be saved and restored. 

The global registers can be used for temporaries and for global variables or pointers, either visi- 
ble to the user or maintained as part of the program's execution environment. For Instance, one 
could by convention address all global scalars by offsets from register r7. This would allow 2^® 
bytes of global scalars, and would enable access to these variables faster than if they were only 
accessible via absolute addresses. This is because absolute addresses longer than 13 bits 
require a SETHI instruction. 

D.1.4. Floating-Point Registers 

There are thirty-two 32-bit floating-point registers. They are accessed differently from the other 
registers and cannot be nrK)ved to or from anything but memory. Like the global registers, they 
must be managed by software. Compilers probably will not pass parameters in them, but will 
use them for user variables and compiler temporaries. Across a procedure call, either the caller 
saves the live floating-point registers, or the callee saves the ones it uses and subsequently 
restores them. 
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r3l 

r30 
r29 
r28 
r27 
r26 
r25 
r24 


(i7) 
(FP) 
(i5) 
(i4) 

(i3) 
(12) 

(i1) 
(iO) 


return address 




frame pointer 




incoming param reg 5 


in 


incoming param reg 4 




incoming param reg 3 




incoming param reg 2 




incoming param reg 1 




incoming param reg 




r23 
r22 
r21 
r20 
r19 
r18 
r17 
r16 


(17) 
(16) 
(15) 
(14) 
(13) 
(12) 

(11) 
(10) 


local 7 




local 6 




local 5 


tocal 


local 4 




locals 




local 2 




bcall 




local 




r15 
r14 
r13 
r12 
rl1 
no 

r9 
r8 


(07) 
(SP) 
(05) 
(04) 
(03) 
(02) 
(01) 
(OO) 


temp 




stack pointer 




outgoing param reg 5 


out 


outgoing param reg 4 




outgoing param reg 3 




outgoing param reg 2 




outgoing param reg 1 




outgoing param reg 




r7 
r6 
r5 
r4 
r3 
r2 
r1 
rO 


(g7) 
(g6) 
(g5) 
(g4) 
(g3) 
(g2) 
(gi) 
(gO) 


global 7 




global 6 




global 5 


global 


global 4 




global 3 




global 2 




global 1 









f31 
fO 




floating-point value 


floating 
point 


'■ 




floating-point value 



D.2. The Memory Stack 

Parameters beyond the sixth are passed on the stack. Parameters which must be addressable 
are stored In the stack. Space is reserved on the stack for passing a one-word hidden parame- 
ter. This is used when the caller is expecting to be returned a C language struct by value; It gives 
the address of stack space allocated by the caller for that purpose (see Sectfon D.4). Space is 
reserved on the stack for keeping the procedure's in and local registers, should the register stack 
overflow. Automatic variables which must be addressable are kept there, as are some compiler- 
generated temporaries. These include automatic arrays and automatic records. Space is 
reserved on the stack for saving floating-point registers across calls. Space on the stack may be 
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dynamically allocated using the alloca function from the C library. Automatic variables on the 
stack are addressed relative to FP, while temporaries and outgoing parameters are addressed 
relative to SP. When a procedure is active, its stack frame appears as in Figure D-2. 



FP, oW SP 



SP 



PROGRAM STACK 


Previous Stack 
Frame 


Local stack space for addressable 
automatics 




Dynamically allocated stack space 




Local stack space for compiler temporaries 
and saved floating-point registers 


Current Stack 
Frame 


Outgoing parameters past the sixth 




6 words into which callee may store register 
arguments 




One-word hidden parameter (address 

at which callee should store aggregate retum value) 




16 words in which to save in and local 
registers 





T 

Stack Growth 
(Decreasing Memory Addresses) 



D.3. Example Code 

In the following example we assume the following pseudo-instmctions are provided by the 
assembler: 



pseudo-instruction 



equivalent instruction 



ret jmp %i7 + 8 

retl jmp%o7 + 8 

mov reg^orjmm, reg or %gO, regjorjmm, reg 

The following code fragment shows a simple procedure call with a value returned, and the pro- 
cedure itself : 
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CALLER: 




int 


i; 


1 - 


sum3( 1, 2, 3 ); 






mov 


1, %oO 


mov 


2, %ol 


call 


sum3 


mov 


3, %o2 


mov 


%oO, %17 


... 




CALLEE : 




int 


sum3 ( a, b, c ) 


int 


a, b, c; 


{ 






return a+b+c; 


) 




3m3: 




save %sp, -(16*4), %sp 


add 


%iO, %il, %I7 


add 


%17, %i2, %i7 


ret 




restore %17, 0, %oO 



/* in register %17 */ 



last parameter in delay slot 



/* received %iO, %il and %i2 */ 



! setup new sp 

! compute sum in local 



! move result into output reg, restore 

Since "sum3" does not call any subroutines (i.e. it is a "leaf" routine) it can be recoded as: 



sum3: 



add %oO, %ol, %o3 

retl 

add %o2, %o3, %oC 



! use %o3 as a local 

! can't use ret; use retl 



D.4. Functions Returning Aggregate Values 

Some programming languages, including C, some dialects of Pascal, and Modula-2, allow the 
user to define a function returning an aggregate value, such as a C struct or a Pascal record. 
Since such a value may not fit into the registers, another value returning protocol must be 
defined to return the result in memory. Reentrancy and efficiency considerations require that the 
memory used to hold such a return value be allocated by the function's caller. The address of 
this menxjry area is passed as the one-word hidden parameter mentioned in the section The 
Memory Stack in this appendix. Because of the lack of type safety in the C language, a function 
should not assume that its caller is expecting an aggregate return value and has provided a valid 
memory address. Thus some additional handshaking is required. 

When a procedure expecting an aggregate function value is compiled, an UNIMP instruction is 
placed after the delay-slot instruction following the call to the function in question. The immediate 
field in this UNIMP instmction is the low-order twelve bits of the size in bytes of the aggregate 
value expected. When an aggregate-returning function is about to return Its value in the memory 
allocated by its caller, it first tests for the presence of this UNIMP instruction in its caller's instruc- 
tion stream. If it is found, then the hidden parameter is assumed to be valid, and the function 
returns control to the location following the unimplemented instmction. Othenvise. the hidden 
parameter is assumed not to be valid, and no value can be returned. Conversely, if a scalar- 
returning function is called when an aggregate value is expected, the function returns as usual, 
executing the UNIMP instruction and causing a trap. 
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APPENDIX E: EXAMPLE INTEGER MULTIPLICATION AND 
DIVISION ROUTINES 



E.I. Introduction 

This appendix contains routines a SPARC architecture system might use to perform integer mul- 
tiplication and division. 

In these examples, it is assumed that the assembler provides the following pseudo-instructions: 



Pseuck) instruction 


Equivalent Instruction 


nop 


sethiO,%gO 


jmp 


jmpf address, %gO 


ret 


jmpVoiT-hS 


retl 


jmp%o7'iS 


mov reg_orJmm, neg 


or %gO, reg_orJmm, reg 


tstreg 


subcc reg.%gO, %gO 


neg reg 


sub %gO, reg, reg 


cmp reg, reg^orjmm 


subcc reg, regjorjmm, %gO 


inc reg 


add reg. 1,reg 


incc reg 


addcc reg, 1, reg 


dec reg 


sub reg, 1, reg 


deccc reg 


subcc reg, 1, reg 



It is also assumed that the assembler recognizes "A..V'-style comments, and "/* as the begin- 
ning of a comment which extends to the end of the current line. 
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* Procedure to perform a 32-bit by 32-bit multiply. 

* Pass the multiplicand in %iO, ana the multiplier in %il. 

* The least significant 32 bits of the result are returned in %iO, 

* and the most significant in %il. 

* 

* This code has an optimization built-in for short (less than 13-bit) 

* multiplies. Short multiplies require 26 or 27 instruction cycles, and 

* long ones require 47 to 51 instruction cycles. For two positive numbers 

* (the most common case) a long multiply takes 47 instruction cycles. 

* This code indicates that overflow has occurred by leaving the Z condition 

* code clear. The following call sequence would be used if you wish to 

* deal with overflow: 



call 

nop 

bnz 



.mul 



overflow code 



(or set up last parameter here) 
(or tnz to overflow handler) 



* Note that this is a Leaf routine; i.e. it calls no other routines and does 

* all of its work in the Out registers. Thus, the usual SAVE and RESTORE 

* instructions are not needed. 



•global .mul 



.mul : 



mov %oO, %y 

andncc %oO, Oxfff, %gC 

be mul_shortway 

andcc %gO, %gO, koA 

long multiply 



mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 



%o4. 


%ol, 


%o4. 


%ol, 


%o4. 


%ol, 


%o4, 


%ol, 


%o4. 


%Ol, 


%o4, 


%ol, 


%o4. 


%cl, 


%o4 , 


%ol. 


%o4, 


%ol. 


%o4, 


%cl. 


%o4. 


%ol. 


%o4. 


%ol. 


%o4. 


%ol. 


%o4. 


%ol, 


%o4. 


%ol, 


%o4. 


%ol. 


%o4. 


%ol. 


%o4. 


%ol, 


%o4. 


%ol. 


%o4, 


%ol, 


%o4. 


%C1, 



%o4 
%o4 
%o4 
%o4 
%o/. 
%o< 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 



multiplier to Y register 

mask out lower 12 bits 

ca-^ do it the short way 

zero the partial product and clear N and V conditions 



first iteration of 33 
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mulscc 
mulscc 
roulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 



%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %gO, %o4 



! 32nd iteration 

! last iteration only shifts 



If %oO (multiplier) was negative, the result is: 

(%oO * %ol) + %ol * (2**32) 
We fix that here. 



tst 


%oO 


rd 


%y, %oO 


bge 


If 


tst 


%oO 



for when we check for overflow 



sub %o4, %ol, %o4 ! bit 33 and up of the product are in 

! %oj, so we don't have to shift %ol 

We haven't overflowed if: 

low-order bits are positive and high-order bits are 
low-order bits are nega::ive and high-order bits are -1 

If you are not interested in detecting overflow, 
replace the following few ir.str'jctions with: 



1: retl 
mov 



%o4, %ol 



bge 


2f 




addcc 


%o4. 


%gO, %ol 


retl 






subcc 


%o4, 


-1, %gO 


retl 






nop 







short multiply 



mul_shortway: 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 



%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 



! if low-order bits were positive. 

! return most sig. bits of prod and set 

! Z .appropriately (for positive product) 

! le.^f-routine return 

! set Z if high order bits are -1 (for negative product) 

! le.if-routine return 



! first iteration of 13 
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mulscc 
roulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 



%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %gO, %o4 



12th iteration 

last iteration only shifts 



3: 



rd %y, %o5 

sll %o4, 12, %oO ! left shift middle bits by 12 bits 

srl %o5, 20, %o5 ! right shift low bits by 20 bits 

We haven't overflowed if: 

low-order bits are positive and high-order bits are 
low-order bits are negative and high-order bits are -1 

if you are not interested in detecting overflow, 
replace the following code with: 



or 

retl 
mov 



%o5, %o4, %oO 
%o4, %oi 



orcc 

bge 

sra 

retl 
subcc 



retl 
addcc 



%o5, %oO, %oO 

3f 

%o4, 20, %ol 



%ol, -1, %gO 



%ol, %gC, %gC 



merge for true product 

if low-order bits were positive. 

right shift high bits by 20 bits 

and put into %ol 

ieaf-routine return 

se:: 2 if high order bits are -1 (for 

negative product) 

leaf-routine return 

se-. Z if high order bits are 
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* Procedure to perform a 32 by 32 unsigned multiply. 

* Pass the multiplier in %oO, and the multiplicand in %ol. 

* The least significant 32 bits of the result will be returned in %oO, 

* and the most significant in %ol. 

* 

* This code has an optimization built-in for short (less than 13 bit) 

* multiplies. Short multiplies require 25 instruction cycles, and long ones 

* require 46 or 48 instruction cycles. 

* 

* This code indicates that overfjov; has occured, by leaving the Z condition 

* code clear. The following call sequence would be used if you wish to 

* deal with overflow: 



call 

nop 

bnz 



.umul 

! (or set up last parameter here) 
overflow code ! (or tnz to overflow handler) 



* Note that this is a Leaf routine; i.e. it calls no other routines and does 

* all of its work in the Out registers. Thus, the usual SAVE and RESTORE 

* instructions are not needed. 
*/ 

.global .umul 
.umul: 
or 
mov 
andncc 
be 
andcc 



%oO, %ol, %o4 ! logical or of multiplier and multiplcand 
%oO, %y ! r.ultjplior to Y register 
%o4, Oxfff, %c5 ! mask out lower 12 bits 
mul_shortway ! can do ii. the short way 

%gC, %gO, %o4 ! zero the partial product and clear N and V conditions 



! long multiply 



mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 



%o4. 


%ol, 


%o4. 


%ol. 


%o4. 


%ol. 


%o4. 


%ol, 


%o4. 


%ol. 


%o4. 


%ol. 


%o4. 


%ol. 


%o4, 


%ol. 


%o4. 


%ol, 


%o4. 


%ol. 


%o4. 


%ol, 


%o4. 


%ol, 


%o4. 


%ol. 


%o4. 


%ol. 


%o4. 


%ol, 


%o4. 


%ol, 


%o4. 


%ol. 


%o4. 


%oi. 


%o4. 


%ol. 


%o4. 


%ol. 


%o4. 


%ol, 


%o4. 


%ol. 



%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 
%o4 



firsi iteration of 33 
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ntulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
/* 

* Normally, with the shifty-add approach, if both numbers are positive, 

* you get the correct result. With 32-bit twos-complement numbers, 

* -X can be represented as ( (2 - (x/ (2**32)) mod 2) * 2**32. To avoid 

* a lot of 2**32' s, we can just move the radix point up to be just 

* to the lieft of the sign bit. So: 



%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %gO, %o4 



! 32nd iteration 

! last iteration only shifts 



X * -y 
-x * -y 



y = (xy) mod 2 

y « (2 - x) mod 2 * y = (2y - xy) mod 2 

X * (2 - y) mod 2 = (2x - xy) mod 2 

(2 - x) * (2 - y) = (4 - 2x - 2y + xy) mod 2 



* For signed multiplies, we subtract (2**32) * x from the partial 

* product to fix this problem 5cr negative multipliers (see multiply. s) 

* Because of the way the shift into the partial product is calculated 

* (N xor V), this terjr is automatically removed for the multiplicand, 

* so we don't have to adjust. 



* But for unsigned multiplies, the high order bit wasn't a sign bit, 

* and the correction is wrong. So for unsigned multiplies where the 

* high order bit is one, we enci up with xy - (2**32) * y. To fix it 

* we add y * (2**32) . 
*/ 

tst %ol 



bge 


If 




nop 






add 


%o4, %oO, 


%OM 


rd 


%y, %oO 




retl 






addcc 


%o4. 


%gO, 



! refjrr. Ic^ast sig. bits of prod 
! lea f -rout. ine return 

%ol ! fielay slot; return high bits and set 
! zero bit appropriately 



short multiply 



mul_shortway: 

mulscc %o4, %ol, %o4 



mulscc 
mulscc 
mulscc 
mulscc 
mulscc 
mulscc 



! first: iteration of 13 



%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 

%o4, %ol, %o4 
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mulscc 


%o4. 


%ol. 


%o4 


mulscc 


%o4. 


%ol, 


%o4 


mulscc 


%o4. 


%ol. 


%o4 


mulscc 


%o4. 


%ol. 


%o4 


mulscc 


%o4. 


%ol. 


%o4 


mulscc 


%o4, 


%gO, 


%o4 


rd 


%y, 


%o5 




sll 


%o4. 


12, 


%o4 


srl 


%o5. 


20, 


%o5 


or 


%o5. 


%o4, 


%oO 



! l?t.h iteration 

! last iteration only shifts 



left shift partial product by 12 bits 
%o5 ! right shift product by 20 bits 
! mergci for true product 



The delay instruction (addcc) moves zero into %ol, 

sets the zero condition code, and clears the other conditions. 

This is the equivalent result to a long umultiply which doesn't overflow. 



retl 
addcc 



! leaf -routine return 
%gC, %gO, %o] 
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E.4. Division 



Integer division implemented in software or microcode is usually done by a method such as the 
non-restoring algorithm, which provides one digit of quotient per step. A W-by-W digit division, of 
radix-B digits. Is most easily achieved using 2*W-digit arithmetic. 

E.4.1. Program 1 

A binary-radix, 16-digit version of this method is illustrated by the C language function in Program 
1, which performs an unsigned division, producing the quotient in Q and the remainder in R. 

♦include <stdio.h> 
tinclude <assert.h> 

♦define W 16 /* maximum number of bits in the dividend & divisor */ 

unsigned short 

divide ( dividend, divisor ) 

unsigned short dividend, divisor; 



{ 



long R; /* partial rcrrainaer — need 2*W bits */ 
unsigned short Q; /* partial quotient */ 
int iter; 

R « dividend; 
Q « 0; 

for ( iter = W; iter >= C; iter -= 1 ) { 
assert ( Q*divisor-i-R == divider.ci ) ; 
if (R >= 0) { 

R -= divisor <<iter; 
Q +« l«iter; 
} else { 

R += divisor <<iter; 
Q -= l<<iter; 
) 
) 

if ( R < ) { 
R +« divisor; 
Q -= 1; 
) 
return Q; 
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E.4.2. Program 2 

In the siniple form shown above, this method has two drawbacks: 

It requires a 2*W-digit accumulator 

It always requires W steps. 

Both these problems may be overcome by estimating the quotient before the actual division is 
earned out. This can cut the time required for a division from 0(W) to O(topfl(quotient)). Pro- 
gram 2 illustrates how this estimate may be used to reduce the number of divide steps required 
and the size of the accumulator. 
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♦include <stdio.h> 
♦include <assert.h> 

♦define W 32 /* maximum number of bits in a divisor or dividend */ 

♦define Big_value (unsigned) (1« {W-2) ) /* 2 '^ (W-1) */ 

int 

estimate_log_quotient ( dividend, divisor ) 

unsigned dividend, divisor; 
{ 

unsigned log_quotient; 

for (log_quotient = 0; log^quotient < W; log_quotient +« 1 ) 
if ( ( divisor <<log_quotient ) > Big_value ) 

break; 
else if ( (divisor <<lcg_quot lent) >= dividend ) 

break; 

return log__quotient ; 
} 

unsigned 

divide ( dividend, divisor ) 

unsigned dividend, divisor; 



{ 



int R; /* remainder */ 
unsigned Q; /* quotient, */ 

int iter; 

R « dividend; 

Q = 0; 

for ( iter = estimate_log_cuotient ( dividend, divisor); iter >* 0; iter -« 1 ){ 

assert ( Q*divisor+R == dividend ); 

if (R >== 0) { 

R -= divisor <<iter; 

Q += l«iter; 
} else { 

R += divisor <<iter; 

Q -= l<<iter; 
) 
} 

if ( R < ) { 
R += divisor; 
Q -= 1; 
} 
return Q; 
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E.4.3. Programs 

Another way of reducing the number of division steps required is to choose a larger base, B'. 
This is only feasible if the cost of the radix-B' inner loop does not exceed the cost of the radix-B 
Inner loop by wore than Ioqb (B'). When B* « S'^ for some integer N. a radix-B' inner loop can 
easily be constructed from the radix-B inner loop by arranging an N-high, B-ary decision tree. 
Programs 3 and 4 illustrate how this can be done. Program 3 uses N-level recursion to show the 
principle, but the overhead of recursion in this example far outweighs the loop overhead saved 
by reducing the nunnber of steps required. Program 4 shows how mn-time recursion can be elim- 
inated if N is fixed at two. 
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tinclude <stdio.h> 
tinclude <assert,h> 

♦define W 32 /* bits in a word */ 

int B, /* number base of division (must be a power of 2) */ 

N; /* log2(B) */ 

fdefine WB (W/N) /* base B digits in a word */ 
♦define Big^value (unsigned) (B<< (WB-2) ) /* B '^ (WB-1) */ 

int Q, /* partial quotient */ 

R, /* partial remainder */ 

V; /* multiple of the divisor »/ 
int 
estimate_log__quotient ( dividend, di\'isor ) 

unsigned dividend, divisor; 
{ 

unsigned log^quotient; 

for (log_quotient = 0; log^quoti ent < WB; log_quotient +« 1 ) 
if ( ( divisor <<log_qucT:ient*N) > Big_value ) 

break; 
else if ( (divisor <<log_quotient*N) >= dividend ) 

break; 

return log_quotient; 



int 

compute_digit ( level, quoLier.;: digit) 

int level, quot ient_digit ; 
{ 

if (R >= 0) { 

R -= V « level; 

quotient__digit += l<<level; 
} else { 

R +« V << level; 

quotient_digit -= l<<level; 
} 
if (level > 0) 

return com.pute_digit ( level-l, quotient_digit 
else 

return quotient aigit; 



unsigned 

divide ( dividend, divisor ) 

unsigned dividend, divisor; 
{ 

int iter; 
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B -= (1«(N)); 

R "= dividend; 

Q - 0; 

for ( iter « estimaLe_log__quotic!nt ( dividend, divisor); iter >« 0; iter -« 1 ) { 

assert ( Q*divisor+R «= dividen(3 ) ; 

V « divisor « (iter*N) ; 

Q += compute_digit ( N-1, C) << (iter*N) ; 

) 

if ( R < ) { 

R += divisor; 

Q -= 1; 
) 
return Q; 
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linclude <stdio.h> 
♦include <assert.h> 

idefine W 32 /* bits in a word */ 

♦define B 4 /* number base of division (must be a power of 2) */ 

♦define N 2 /* log2 (B) «/ 

♦define WB (W/N) /* base B digits in a word */ 

♦define Big__value (unsigned) (B<< (WB-2) ) /* B " WB-1 */ 

int 

estimate_log_quotient ( dividend, divisor ) 

unsigned dividend, divisor; 
{ 

unsigned log__quotient ; 

for (log_quotient « 0; log__quot ient < WB; log_quotient += 1 ) 
if ( ( divisor <<log_quotient*M) > Big_value ) 

break; 
else if ( (divisor <<log_quotient*N) >= dividend ) 
break; 

return log_quotient ; 



int 

unsigned 

divide ( dividend, divisor ) 

unsigned dividend, divisor; 
{ 

int Q, /* partial quotient »/ 

R, /* partial remainder */ 
V; /* multiple of the divisor */ 
int iter; 

R = dividend; 
Q - 0; 

for ( iter « estimate_loc__quot ic^p.t ( dividend, divisor); iter >= 0; iter -« 1 ) { 
assert ( Q*divisor+R == dividend ) ; 
V * divisor « (iter*N); 
/* N-deep, B-wide decision treo */ 
if ( R >= ) { 
R -= V<<1; 
if ( R >= ) { 
R -= V; 

Q += 3 « (N*iter) ; 
) else { 

R += V; 

Q += 1 << (N*iter) ; 
) 
) else { 

R += V<<1; 
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if ( R >= C ) { 
R -= V; 

Q -= 1 <<(N*iter) ; 
) else { 

R += V; 

Q -= 3 <<{N*iter) ; 
} 
} 
} 

if ( R < ) { 
R +« divisor; 
Q -= 1/ 
) 

return Q; 
} 
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E.4.5. Programs 

At the risk of losing even more clarity, we can optimize away several of the t>ookkeeping opera- 
ttons, as shown In Program 5. 
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#include <stdio.h> 
♦include <assert.h> 

♦define W 32 /* bits in a word */ 

♦define B 4 /* number base of division (must be a power of 2) */ 

♦define N 2 /* log2 (B) */ 

♦define WB (W/N) /* base B digits in a word */ 

♦define Big__value (unsigned) (B<< (W3-2) ) /* B '^ WB-1 */ 



int 

unsigned 

divide ( dividend, divisor ) 

unsigned dividend, divisor; 
{ 

int Q, /* partial quotient */ 

R, /* partial remainder */ 

V; /* multiple of the divisor */ 
int iter; 

R « dividend; 
Q = 0; 

V = divisor; 

for ( iter = 0; V <= Big__value fiS V <= dividend; iter += 1 ) 
V «= N; 

for ( V «= (N-1); iter >- C; it.er -= 1 ) { 
Q «= N; 

assert ( Q* (1<< (iter*N) ) *diviso-+R == dividend ); 
/* N-deep, B-wide decision tree */ 
if ( R >= ) { 
R -= V; 
V»= 1; 
if ( R >= ) ( 
R -= V; 

V >>= 1; 
Q +- 3 ; 

} else { 

R += V; 

V >>= 1; 
Q += 1 ; 

} 
} else { 

R += V; 
V »== 1; 
if ( R >= ) { 
R -« V; 

V >>= 1; 
Q — 1; 

} else { 

R += V; 

V »- 1; 
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Q -= 3; 
) 

} 
) 
if ( R < ) { 

R += divisor; 

Q -«= 1; 
) 
return Q; 



E-21 Example Integer Multiplication and Division Routines E-21 



Solbourne Computer, Inc. 

E.4.6. Program 6 

Program 6 is, essentially, the method we recommend for SPARC. The depth of the decision tree 
— two in the preceding examples — is controlled by the constant N, and is currently set to three, 
based on empirical evidence. The decision tree is not explicitly coded, but defined by the recur- 
sive m4 macro DEVELOP_QUOTIENT_BITS. Other differences include: 

Handling of signed and unsigned operands 

More care is taken to avoid overflow for very large quotients or divisors 

Special tests are made for division by zero and zero quotient 

• The routine is conditionally compiled for either division or remaindering. 
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/* 

* Divison/Remainder 

* 

* Input is: 

* dividend — the thing being divided 

* divisor — how many ways to divide it 

* Important parameters: 

* N — how many bits per iteration we try to get 

* as our current guess: define (N, 3) 

* WORDSIZE — how many bits altogether we're talking about: 

* obviously: define (WORDSIZE, 32) 

* A derived constant: 

* TOPBITS — how many bits are in the top "decade" of a number: 

* define (TOPBITS, eval ( WORDSIZE - N* ( (WORDSIZE-1) /N) ) ) 

* Important variables are: 

* Q — the partial quotient under development — initally 

* R — the renainder so far — initially «= the dividend 

* ITER — number of it.eraiicns of the main division loop which will 

* be required. Eq-jal tc CEIL( lg2 (quotient ) /N ) 

* Note the*- this is loc_base__ (2''N) of the quotient. 

* V — the current ccT.parand -- initially divisor*2'' (ITER*N-1) 

* Cost: 

* current estir.ate for non-large dividend is 

* CEIL( lc2 (quotient) / N ) x ( 10 •»■ 7N/2 ) + C 

* a large dividend is one greater than 2^ (31-TOPBITS) and takes a 

* different path, as the upper bits of the quotient must be developed 

* one bit at a time. 

* This uses the dA and cpp macro preprocessors. 



#include "sw_trap.h" 

define (dividend, ^%iC' ) 

define (divisor, *%il' ) 

define (Q, '%i2') 

define (R, '%i3' ) 

define (ITER, ^%1D' ) 

define (V, ^11') 

define (SIGN, ^%i2') 

define (T, *%13' ) ! working variable 

define (SC, ^%14' ) 

/* 

* This is the recursive definition of how we develop quotient digits. 

* It takes three important parameters: 

* $1 — the current depth, 1<=S1<=N 

* $2 — the current accumulation of quotient bits 

* N — max depth 

* We add a new bit to $2 and either recurse or insert the bits in the quotient. 

* Dynamic input : 

* R — current remainder 

* Q — current quotier^t 

* V — current comparard 

* cc — set on current value of R 

* Dynamic output: 
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* R'r Q', V, cc' 
*/ 

define (DEVELOP_QUOTIENT_BITS, 

' Idepth $1, acci2mulaLed bits $2 

bl L.$l.eval(2"N+$2) 

srl V,1,V 

! remainder is positive 

subcc R,V, R 

ifelse( $1, N, 
b 9f 
add Q, ($2*2^!), Q 

',^ DEVELOP__QUCTIENT_BITS ( incrCSl), 'eval (2*$2+l) ' ) 

') 
L.$l.eval (2''N+S2) : ! remainder is negative 

addcc R,V,R 

ifelse{ $1, N, 

* b 9f 

add Q, {S2-2-I), Q 

',^ DEVEL0?_QU0TIErCT_3ITS( incr(Sl), ^eval (2*$2-l) ' ) 

') 

ifelse( $1, 1, *9:' ) 
') 
ifelseC ANSWER, ^quotient', ^ 

•global .div, .udiv 
.udiv: ! UNSIGNED DIVIDE 

save %sp, -64, %sp 

b divide 

mov 0,SIGN ! result always positive 

.div: ! SIGNED DIVIDE 
save %sp, -64, %sp 

orcc divisor, divi ac^.d, %gO ! are either dividend or divisor negative 
bge divide ! if not, skip this junk 

xor divisor, dividend, SIGN ! record sign of result in sign of SIGN 
tst divisor 
bge 2f 
tst dividend 
! divisor < 
bge divide 
neg divisor 
2: 

! dividend < 
neg dividend 
! FALL THRC'JGH 

.global .re:r., .urcr. 
.urem: ! UNSIGNED REMAINDER 

save %sp,-64,%sp ! do this for debugging 

b divide 

mov 0,SIGN I result always positive 

.rem: ! SIGNED REMAINDER 

save %sp,-64,%sp ! do this for debugging 
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orcc 


divisor, dividend, %gO 


bge 


divide 


mov 


dividend, SIGN 


tst 


divisor 


bge 


2f 


tst 


dividend 


1 


divisor < 


bge 


divide 


neg 


divisor 


2: 




! 


dividend < 


neg 


dividend 


1 


FALL THROUGH 



are either dividend or divisor negative 

if not, skip this junk 

record sign of result in sign of SIGN 



divide: 

! compute size of quotient, scale comparand 

orcc divisor, %gO,V ! movcc divisor, V 

te ST__DIVO ! if divisor = 

mov dividend, R 

mov 0,Q 

sethi %hi (1<< (WCRDS:ZE-T0P3ITS-1) ) ,T 

cmp R, T 

blu not^rea.l :y_bic 

mov 0,ITER 



Here, the dividcr.d is >= 2^(31-N) or so. We must be careful here, as 
our usual N-at-a-shrjt divide step will cause overflow and havoc. The 
total number of bits in the result here is N*ITER+SC, where SC <= N. 
Compute ITER, in an unorthodox manner: know we need to Shift V into 
the top decade: so don't even bother to compare to R. 



cmp V, T 

bgeu 3f 

mov 1,SC 

sll V,K,V 

b lb 

inc ITER 

Now cor.p-tc SC 

2: addcc V,V,V 

bcc net tcc__big ! bcc 
add SC,1,SC 



not_too_big 



! We're here if the divisor overflowed when Shifting. 

! This means that R has the high-order bit set. 

! Restore V and subtract from R. 

sll T,T0?3ITS,T I high order bit 

srl V, 1,V ! rest of V 

add V,T,V 

b do_singie__div 

dec SC 
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not_too_big: 
3: cmp V,R 

blu 2b 

nop 

be Go__sincie_div 

nop 
V > R: went too far: back up 1 step 

srl V,1,V 

dec SC 
do single-bit divide steps 

We have to be careful here. We know that R >« V, so we can do the 
first divide step without thinking. BUT, the others are conditional, 
and are only done if R >= 0. Because both R and V may have the high- 
order bit set in the first step, just falling into the regular 
division loop will mess up the first time around. 
So we unroll slightly... 
do__single_div: 
deccc SC 



bl 


end_regular_divide 


nop 




sub 


R,V,R 


mov 


1,Q 


b 


end_single_divlcop 


nop 




single_divloop : 


sll 


C,1,Q 


bl 


If 


srl 


v,i,v 


I R 


>= 


sub 


R,V,R 


b 


2f 


inc 


Q 


1: ! R 


< 


add 


R,V,R 


dec 


Q 



2: 

end_single_divloop: 
deccc SC 

bge single_divlocp 
tst R 

b enG_reguIar__divide 
nop 



not_rea 1 ly__b i g : 

1: 

sll V,N,V 
cmp V, R 
bleu lb 
inccc ITER 
be got__result 
dec ITER 

do_regular_dividc : 
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! Do the main division iteration 

tst R 

! Fall through into divide loop 
divloop: 

sll Q,N,Q 

DEVELOP__QUOTIENT_BITS ( 1, ) 
end_regular_divide : 

deccc ITER 

bge divloop 

tst R 

bge got__res*jit 

nop 

! non-restorinn fixjr here 
ifelse( ANSWER, ^GU0tier,t' , 
^ dec Q 
' , ^ add R, divisor, R 
') 

got_result : 

tst SIGN 

bge If 

restore 

! answer < 

retl ! leaf-routine return 

ifelse( ANSWER, ^quotient', 

* neg %o2,%oC ! quotient <- -Q 
',^ neg %o3,%oC ! rer.ainder <- -R 
') 

1: retl '. jeaf-rou;. ine return 

ifelse( ANSWER, 'quot-ient', 

* mov %c2,%oC I quotient <- Q 
',* inov %o3,%o0 ! re^^ainder <- R 
') 
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F.1. Introduction 

This appendix lists the opcodes and condition codes. 



op instruction 



01 CALL 



op2 


Instruction 


000 


UNIMP 


001 


unimplemented 


010 


Bice 


oil 


unimplemented 


100 


SETHI 


101 


unimplemented 


110 


FBfcc 


111 


CBccc 



F-1 



Opcodes and Condition Codes 



F-1 



Solbourne Computer, Inc. 



op3 


instruction 


000000 


ADD 


000001 


AND 


000010 


OR 


00001 1 


XOR 


000100 


SUB 


000101 


ANDN 


000110 


ORN 


000111 


XNOR 


001000 


ADDX 


001001 


unimplemented 


001010 


unimplemented 


001011 


unimplemented 


001100 


SUBX 


001101 


unimplemented 


001110 


unimplemented 


001111 


unimplemented 


010000 


ADDcc 


010001 


AN Dec 


010010 


ORcc 


010011 


XORcc 


010100 


SUBcc 


010101 


ANDNcc 


010110 


ORNcc 


010111 


XNORCC 


011000 


ADDXCC 


011001 


unimplemented 


011010 


unimplemented 


011011 


unimplemented 


011100 


SUBXcc 


011101 


unimplemented 


011110 


unimplemented 


011111 


unimplemented 
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op3 


instruction 


100000 


TADDcc 


100001 


TSUBcc 


100010 


TADDccTV 


100011 


TSUBccTV 


100100 


MULScc 


100101 


SLL 


100110 


SRL 


100111 


SRA 


101000 


RDY 


101001 


RDPSR 


101010 


RDWIM 


101011 


RDTBR 


101100 


unimplemented 


101101 


unimplemented 


101110 


unimplemented 


101111 


unimplemented 


110000 


WRY 


110001 


WRPSR 


110010 


WRWIM 


110011 


WRTBR 


110100 


FPop1 


110101 


FPop2 


110110 


CPopI 


110111 


CPop2 


111000 


JMPL 


111001 


RETT 


111010 


TiCC 


111011 


IFLUSH 


111100 


SAVE 


111101 


RESTORE 


111110 


unimplemented 


111111 


unimplemented 
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op3 


instruction 


000000 


LD 


000001 


LDUB 


000010 


LDUH 


000011 


LDD 


000100 


ST 


000101 


STB 


000110 


STH 


000111 


STD 


001000 


unimplemented 


001001 


LDSB 


001010 


LDSH 


001011 


unimplemented 


001100 


unimplemented 


001101 


LDSTUB 


001110 


unimplemented 


001111 


SWAP 


010000 


LDA 


010001 


LDUBA 


010010 


LDUHA 


010011 


LDDA 


010100 


STA 


010101 


STBA 


010110 


STHA 


010111 


STDA 


011000 


unimplemented 


011001 


LDSBA 


011010 


LDSHA 


011011 


unimplemented 


011100 


unimplemented 


011101 


LDSTUBA 


011110 


unimplemented 


011111 


SWAPA 
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Op3 


instruction 


100000 


LDF 


100001 


LDFSR 


100010 


unimplemented 


100011 


LDDF 


100100 


STF 


100101 


STFSR 


100110 


STDFQ 


100111 


STDF 


101000— 101111 


unimplemented 


110000 


LDC 


110001 


LDCSR 


110010 


unimplemented 


110011 


LDDC 


110100 


STC 


110101 


STCSR 


110110 


STDCQ 


110111 


STDC 


111000 — 111111 


unimplemented 



opt 


instruction 


000000001 


FMOVs 


000000101 


FNEGs 


000001001 


FABSs 


000101001 


FSQRTs 


000101010 


FSQRTd 


000101011 


FSQRTx 


001000001 


FADDs 


001000010 


FADDd 


001000011 


FADDx 


001000101 


FSUBs 


001000110 


FSUBd 


001000111 


FSUBx 


001001001 


FMULs 


001001010 


FMULd 


001001011 


FMULx 


001001101 


FDIVs 


001001110 


FDlVd 


001001111 


FDIVx 


011000100 


FiTOs 


011000110 


FdTOs 


011000111 


FxTOs 


011001000 


FiTOd 


011001001 


FsTOd 


011001011 


FxTOd 


011001100 


FiTOx 


011001101 


FsTOx 


011001110 


FdTOx 


011010001 


FsTOi 


011010010 


FdTOi 


011010011 


FxTOi 



F-5 



Opcodes and Condition Codes 



F-5 



Solbourne Computer, Inc. 



opf 


Instruction 


001010001 


FCMPs 


001010010 


FCMPd 


001010011 


FCMPx 


001010101 


FCMPEs 


001010110 


FCMPEd 


001010111 


FCMPEx 



cond 


test 


0000 


never 


0001 


equal 


0010 


less than or equal 


0011 


less than 


0100 


less than or equal, unsigned 


0101 


carry set (less than, unsigned) 


0110 


negative 


0111 


overflow set 


1000 


always 


1001 


not equal 


1010 


greater than 


1011 


greater than or equal 


1100 


greater than, unsigned 


1101 


carry clear (greater than or equal, unsigned) 


1110 


positive 


1111 


overflow clear 



cond 


test 


0000 


never 


0001 


not equal 


0010 


less than or greater than 


0011 


unordered or less than 


0100 


less than 


0101 


unordered or greater than 


0110 


greater than 


0111 


unordered 


1000 


always 


1001 


equal 


1010 


unordered or equal 


1011 


greater than or equal 


1100 


unordered or greater than or equal 


1101 


less than or equal 


1110 


unodered or less than or equal 


1111 


ordered 
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opcode 


cond 


bp_CP_cc[1:0]tesl 




CBN 


0000 


Never 




CB123 


0001 


1 or 2 or 3 




CB12 


0010 


1or2 




CB13 


0011 


1or3 




CB1 


0100 


1 




CB23 


0101 


2 or 3 




CB2 


0110 


2 




CB3 


0111 


3 




CBA 


1000 


Always 




CBO 


1001 







CB03 


1010 


0or3 




CB02 


1011 


0or2 




CB023 


1100 


or 2 or 3 




CB01 


1101 


Oorl 




CB013 


1110 


Oor 1 or3 




CB012 


1111 


or 1 or 2 
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